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Abstract 


Our  objective  is  to  give  asymptotic  expansions  for  moments  of  standardized  statistics  based 
on  7i  independent,  identically  distributed  random  variables  as  71  — ►  00.  The  basic  premise  is  that  a 
simple  tail  condition  on  the  underlying  distribution  which  implies  the  moments  of  a  standardized 
quantile  converge  to  the  moments  of  an  appropriate  normal  distribution  is  sufficient  to  assure  the 
validity  of  asymptotic  moment  expansions  for  many  statistics  which  are  resistant  to  outliers. 

The  primary  result  we  present  gives  sufficient  conditions  for  the  validity  of  moment  ap¬ 
proximations  based  on  moments  of  Taylor’s  series  approximations  which  are  obtained  by  using  func¬ 
tional  differentiation.  We  apply  the  theory  to  some  L-  and  M-estimates  and  present  a  Monte  Carlo 
study  to  show  that  the  approximations  for  the  variance  of  statistics  based  on  small  to  moderate 
sample  sizes  can  be  quite  good. 

Prior  to  studying  the  above  general  problem  we  consider  the  problem  of  the  convergence  of 
the  moments  of  a  standardized  quantile  to  those  of  an  appropriate  normal  distribution.  Our  proof 
of  moment  convergence  requires  fewer  non-tail  conditions  on  the  underlying  distribution  than  were 
used  in  previously  published  results.  We  also  extend  the  result  to  show  necessary  and  sufficient  tail 
conditions  on  the  underlying  distribution  for  convergence  of  the  moment  generating  function  of  a 
standardized  quantile  to  that  of  a  normal  distribution. 
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Chapter  1 

Summary  and  literature  review 


§1.1  Introduction. 

This  chapter  summarizes  the  results  obtained  and  gives  a  literature  review  for  the  two  basic 
problems  which  we  consider.  First  we  consider  the  convergence  of  the  moments  of  a  standardized 
quantile  to  the  moments  of  a  normal  distribution,  and  then  we  move  on  to  summarize  results  on 
asymptotic  expansions  for  moments  of  robust  statistics. 

§1.2  Standardized  quantiles. 

In  chapter  2  we  will  attack  the  problem  of  the  convergence  of  the  moments  of  a  standardized 
quantile  to  the  moment's  of  a  normal  distribution  using  direct  methods;  i.e.,  we  will  write  down 
integral  expressions  for  expectations  and  use  standard  tools  from  analysis  to  obtain  results.  Although 
much  of  the  theory  of  the  rest  of  the  thesis  is  not  heavily  dependent  on  this  chapter,  some  basic 
ideas  are  illustrated  without  the  additional  tools  and  technical  problems  of  later  chapters. 

For  independent,  identically  distributed  random  variables  with  distribution  function  F, 
necessary  and  sufficient  tail  conditions  needed  for  convergence  of  moments  of  standardized  quantiles 
are 

0  <  <*+  =  lim  inf  zMiziM  and  0  <  -  lim  i„f  ~  lof f  (1.21) 

Z— *-oo  logo;  x- — *o©  log® 

Blom  (1958),  Sen  (1959)  and  Bickel  (1967)  have  considered  equivalent  conditions,  but  have  not 

explicitly  defined  quantities  which  are  quite  as  useful  as  a+  and  a _ We  will  discuss  the  relation  of 

the  values  of  a+  and  a_  to  the  existence  of  moments,  the  existence  of  moments  of  order  statistics, 

regular  variation,  and  hazard  functions,  as  well  as  their  relation  to  the  convergence  of  moments  of 
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standardized  quantiles.  The  only  other  condition  on  F  we  require  is  that  it  be  differentiable  at  the 
quantile  of  interest.  Other  authors  have  required  that  F  be  an  absolutely  continuous  distribution, 
but  we  have  developed  a  simple  proof  which  does  not  require  this  condition. 

Most  of  the  results  of  chapter  2  will  be  concerned  with  the  expectations  of  functions  of 
standardized  quantiles.  This  general  approach  will  allow  us  to  consider  convergence  in  distribution 
and  convergence  of  the  moment  generating  function  as  well  as  convergence  of  moments.  Whereas  the 
result  on  convergence  in  distribution  is  contained  in  results  given  by  Smirnov  (1952)  and  Wretman 
(1978),  and  the  results  on  convergence  of  moments  are  variations  on  previous  results  as  discussed 
above  and  in  chapter  2,  the  result  on  the  convergence  of  the  moment  generating  function  is  believed 
to  be  completely  new. 


§1.3  Robust  statistics. 

In  chapter  3  we  will  extend  the  results  of  chapter  2  in  two  ways.  First,  we  will  show  that  the 
conditions  of  (1.2.1)  are  sufficient  to  assure  convergence  of  moments  of  many  statistics  which  have 
bounded  influence  functions.  Second,  we  give  higher  order  expansions  of  moments  using  functional 
differentiation. 


Previous  applications  of  functional  differentiation  in  statistics  have  been  proofs  of  versions 
of  the  central  limit  theorem,  the  theory  of  Edgeworth  expansions,  the  law  of  the  iterated  logarithm, 
and  the  Berry-Esscen  theorem.  Serfling  (1980),  Reeds  (1976),  and  Huber  (1981)  present  surveys  on 
the  applications  of  functional  differentiation  in  statistics.  To  our  knowledge  the  theory  has  not  been 
used  to  prove  the  Validity  of  asymptotic  moment  expansions. 


The  theory  presented  here  involves  showing  that  for  a  functional  statistic  T,  an  underlying 
distribution  F,  and  an  empirical  distribution  function  Fn  an  expansion  of  the  form 


E{{T{Fn)-T{F))r]  =  E 


T^F’ Fn  ~  +  o(«"(r+fc_1)/2)  (1.3.1) 


is  valid  under  some  assumptions.  We  have  used  a  version  of  Frcchet  differentiation  to  prove 
this  result.  The  condition  of  Frcchet  differentiability  on  T  is  a  strong  one.  If  a  functional  T 
is  Frechet  differentiable  then  the  corresponding  functional  statistic  T{Fn)  is,  in  general,  resistant 
to  outliers.  It  is  this  fact  that  allows  us  to  use  the  same  tail  conditions  which  are  used  for 
quantiles  to  show  convergence  of  moments  of  many  other  statistics.  That  Frechet  differentiability 
is  a  stronger  condition  than  we  might  like  is  indicated  by  the  fact  that  quantiles  do  not  correspond 
to  a  Frechet  differentiable  functional,  and  yet  quantiles  are  statistics  which  are  resistant  to  outliers 
whose  moments  converge  under  the  tail  conditions  we  use. 


1.3.  Robust  statistics. 


S 


At  the  end  of  chapter  3  we  give  (previously  known)  results  to  aid  in  computing  the 
right  hand  side  of  (1.3.1)  to  within  o[n~ (,+fc-1)/2).  In  chapter  4  we  develop  formulas  for  these, 
approximations  for  M-estimates  of  location  which  are  not  scale  invariant  and  for  L-estimates.  In 
particular,  we  give  formulas  for  first  and  second  order  mean  and  variance  approximations  in  these 
cases.  In  chapter  5  we  include  Monte  Carlo  studies  to  test  how  well  the  moment  approximations 
work  in  small  to  moderate  sample  sizes.  We  present  a  small  simulation  study  of  nonparametric 
estimates  of  variance  obtained  through  the  use  of  the  above  mentioned  variance  approximation 
formulas.  The  relation  of  these  estimates  to  the  delta  method  and  the  bootstrap  is  noted.  Finally, 
we  try  one  method  of  extending  our  theory  to  quantiles  and  trimmed  means  to  demonstrate  some 
of  the  limitations  of  our  results. 

Bickel  (1967)  uses  convergence  of  moments  of  quantiles  and  theory  on  Brownian  bridges  as 
his  primary  tools  for  showing  the  convergence  of  moments  of  L-estimates.  We  use  similar  results  on 
quantiles,  but  using  functional  differentiation  as  our  other  basic  tool  allows  us  to  extend  Bickel’s  work 
in  several  ways.  First,  we  have  fewer  restrictions  on  the  distribution  function  to  get  convergence  of 
moment  results  for  L-estimates.  There  is  a  tradeoff  between  restrictions  on  the  distribution  function 
and  restrictions  on  the  weight  function  for  an  L-estimate  in  formulating  theorems  on  the  asymptotics 
of  an  L-estimatq.  Bickel  proves  results  with  fewer  restrictions  on  the  weight  function.  Second,  we 
extend  his  results  to  higher  order  moment  expansions.  Third,  our  results  go  beyond  his  in  that  we 
apply  them  to  M-estimates  and  have  the  potential  to  apply  them  to  other  robust  estimates. 

Stigler  (1974)  has  shown  that  the  variances  of  many  L-estimates  converge  to  those  of  their 
limiting  distributions.  His  method  of  proof  is  to  use  Ilajek  projections,  which  requires  L2-convergence 
to  get  convergence  in  distribution  results.  The  basic  assumptions  needed  are  a  smooth  weight 
function  and  either  the  existence  of  a  variance  of  the  underlying  distribution  or  the  deletion  of  a 
proportion  of  the  extreme  order  statistics.  Our  theorem  on  L-estimates  is  an  extension  to  higher 
moments  and  higher  order  expansions  of  his  theorem  5  which  has  weaker  conditions  on  the  tails  of 
the  underlying  distribution. 

Mason  (1981)  extends  Stiglcr’s  results  on  the  convergence  of  variances  of  L-estimates  in 
the  case  where  the  variance  of  the  underlying  distribution  does  not  exist.  Instead  of  requiring  that 
a  positive  proportion  of  the  extreme  order  statistics  have  coefficient  zero,  he  requires  only  that  a 
finite  number  of  the  extreme  order  statistics  have  coefficient  zero.  Although  we  have  not  done  so,  it 
should  be  possible  to  extend  our  results  in  this  fashion. 

Eyrion  (1982)  applies  some  of  the  theory  given  here  in  a  study  of  location  and  scale  invariant 
M-estimates  and  P-estimates  (P-estimates  are  analogs  of  Pitman  estimates;  see  Johns  (1979))  using 
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some  of  the  theory  presented  here.  He  has  also  attempted  to  automate  much  of  the  algebra  and 
calculus  on  which  we  spend  considerable  energy  in  chapter  4. 


Chapter  2 

Convergence  of  moments  of  quantiles 


§2.1  Introduction. 


Let  X1,X2) ...  be  independent,  identically  distributed  (iid)  random  variables  with  cumula¬ 
tive  distribution  function  (cdf)  F.  Our  convention  will  be  that  F  is  right  continuous; 

i.e.,  F(x)  —  P{Xy  <  x}.  Assume  c  £  (0, 1),  F(q)  —  c,  and  the  derivative  of  F  at  q  is  f(q)  >  0. 
Denote  the  order  statistics  of  XltX2, . . . ,  Xn  by  Xl:n,X2,n, ... .  ,Xn:n.  Let  an  =  cn  +  0(1).  Assume 
Z  N(0,  c(l  —  c)/  f2[q)).  Given  these  assumptions,  we  will  consider  conditions  on  the  tails  of  F 
which  are  necessary  and  sufficient  for  E [g(y/n(Xan.n  -  q))]  to  converge  to  E [g(Z)\.  We  will  consider 
functions  g  in  a  class  which  includes  g{x)  =  /(Xl>oo)(x),  g(x)  =  xr  and  g{x)  =  etx.  The  corresponding 
results  are  convergence  in  distribution,  convergence  of  moments,  and  convergence  of  the  moment 
generating  function,  respectively.  One  pair  of  necessary  and  sufficient  conditions  for  g  to  be  in  this 
class  is 


lim  inf 

x— ►OO 


-  log(l  -  F(x)) 
log(|g(x)|) 


>  0  and  lim  inf 

x— *oo 


-  log(JX-x)) 
l°g(|ff(— *)|) 


>  0. 


(2.1.1) 


Another  necessary  and  sufficient  condition  is  that  there  exists  S  >  0  such  that  E[|g(Xi)|ff]  <  oo. 

A  theorem  which  will  eventually  connect  these  two  types  of  tail  conditions  will  be  given 
in  section  2.2.  Ibis  theorem  is  actually  of  some  interest  in  itself  as  it  may  be  used  to  determine 
whether  or  not  the  expectation  of  a  function  of  a  random  variable  exists.  The  results  stated  above 
will  be  proved  in  section  2.3.  In  section  2.4  we  will  discuss  other  conditions  related  to  the  tail 
conditions  in  (2.1.1).  The  moment  convergence  results  will  be  extended  to  finite  linear  combinations 
of  quantiles  at  the  end  of  section  2.3  and  to  many  robust  statistics  in  chapters  3,  4  and  5.  Besides 
applications  to  moments  and  moment  generating  functions  of  quantiles  from  sequences  of  iid  random 
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variables,  applications  to  sequential  occupancy  and  related  problems  are  suggested  by  Holst  (1981), 
and  Anderson,  Sobel  and  Uppuluri  (1982). 

Smirnov  (1952)  and  Wretman  (1978)  have  (independently)  shown  that  %/n(Xan;„  —  q)  is 
asymptotically  normal  if  F  is  differentiable  at  q.  This  is  also  a  simple  consequence  of  a  result  on 
Bahadur  representation  given  by  Ghosh  (1971).  Asymptotic  theory  for  the  case  f(q)  =  0,  the  case 
where  left  and  right  hand  derivatives  of  F  at  q  differ,  and  the  case  with  an  —  cn  +  o(\/n)  will  not 
be  given  here.  It  should  be  easy  to  extend  the  present  theory  to  these  cases.  Smirnov  (1952)  gives 
asymptotic  distribution  theory  for  these  cases. 

Other  authors  have  considered  asymptotic  behavior  of  moments  of  order  statistics.  The 
differences  in  the  present  treatment  are  that  we  consider  weaker  conditions  on  F  and  have  introduced 
tail  conditions  which  are  equivalent  to  other  conditions  which  have  been  used.  Sen  (1959)  assumes 
F  is  continuous  everywhere  (for  convenience)  and  twice  differentiable  in  some  neighborhood  of  q. 
The  second  derivative  of  F  at  q  is  needed  to  obtain  a  convergence  rate.  Bickel  (1967)  assumes  that 
/  is  continuous  and  strictly  positive  on  {a;  :  0  <  F(x)  <  1}  and  shows  that  there  exists  a  o(n~r/2) 
bound  for  E[(v/n(Xan;„  —  g))r]  —  E [ZT]  which  is  independent  of  c.  He  remarks  that  for  c  fixed  the 
only  local  requirement  on  F  needed  is  that  /  be  continuous  in  a  neighborhood  of  c — this  is  more 
than  we  require.  More  discussion  on  tail  conditions  and  local  conditions  on  F  is  given  in  section  2.4. 

Bounds  for  moments  of  Xt:n  have  been  given  by  various  authors.  Many  such  results 
are  summarized  in  David  (1980).  In  general,  one  must  do  more  calculation  and/or  make  more 
assumptions  to  get  these  bounds  than  to  get  the  moment  convergence  results  given  here.  David 
also  summarizes  work  done  on  higher  order  expansions  of  moments  of  order  statistics.  Of  the  work 
presented  there,  the  work  of  David  and  Johnson  (1954)  has  the  closest  relation  to  our  work.  Their 
work  is  somewhat  heuristic  in  that  they  do  not  give  tail  conditions  necessary  for  their  results. 

§2.2  Tails  and  expectations. 

A  ‘nearly’  necessary  and  sufficient  condition  for  the  existence  of  the  mean  of  distribution 
will  be  given  in  this  section.  Results  on  the  existence  of  moments  and  the  moment  generating 
function  will  be  considered  as  examples  of  extensions  of  this  result.  Finally,  we  extend  these  results 
by  considering  the  moments  and  the  moment  generating  function  of  an  order  statistic. 

Theorem  2.2.1  provides  the  main  result  needed  to  determine  when  moments  of  (functions 
of)  standardized  quantiles  do  not  converge.  Although  it  is  similar  to  known  results,  we  have  not 
found  the  result  in  the  literature. 


2.2.  Tails  and  expectations. 


Theorem  2.2.1.  Suppose  X  is  an  arbitrary  non-negative  random  variable  with  distribution  func¬ 
tion  F.  Let 

x  —+oo  log  X 

If  a  <  1  then  E[X]  =  oo.  If  a  >  1  then  E[X]  <  oo.  If  a  =  1  either  E[X]  =  oo  or  E[X]  <  oo 
may  hold. 


Proof :  The  last  part  of  the  proposition  will  be  proved  first.  If  F(x)  =  1  —  l/x  for  x  >  1  then  a  =  1 
and  E[X]  =  oo.  If  F(x)  =  1  —  x~1e~'/logx  for  x  >  1  then  a  =  1  and  E[X]  =  3. 

Assume  a  >  1.  Then  there  exists  X  G  (1,  a)  and  xi  such  that  if  x  >  xx  then 
—  log(l  —  F(x))  >  Xlogx.  This  implies  that  if  x  >  Xi  then  1  —  F(x)  <  x_x  and  thus  E[X]  <  oo. 

Now  suppose  a  <  I.  It  will  be  shown  that  Y^T—i  P{-^  >  k}  —  oo  which  implies 
E[X]  =  oo.  If  a  <  1  then  there  exists  X  G  (a,  1)  and  Xi  <  xj'  <  •  •  •  such  that  1  —  F(xn)  >  x~x, 
n  =  1,2,  •••,  and  i„  -*  oo  as  n  ->  oo.  Let  yn  be  the  greatest  integer  less  that  or  equal  to  xn, 
n  =  1, 2,  •  •  •,  and  let  y0  —  0.  Then 


OO  V* 


E  p<x  > =  E  E  p(*  > >  E 

*=1  n=l  fc==Vn-i  +  l  n=l 


EVn  ~  Vn — 1  ^  Vn 

- ? -  -  n'ZoX^^00' 

n—l  Xn  ri— oo  Xn 


Definition  2.2.2.  For  an  arbitrary  non- decreasing  function  g  let  g  *(x)  =  inf{y  :  g(y)  >  x}. 

Corollary  2.2.3.  Let  X  be  an  arbitrary  random  variable  and  denote  its  distribution  function  by 
F.  Let  g  be  an  arbitrary  non-decreasing,  non-negative  function  such  that  g(x)  —*  oo  as  x  — >  F—1(l). 
Let 

«-  lim  inf 

i-.F-'fi)  log  g[x) 

//  a  >  1  then  E[g(X)]  <  oo,  if  a  <  1  then  E[g(X)]  =  oo,  and  if  a  —  l  either  E[y(X)]  =  oo  or 
E[g(X)]  <  oo  may  hold. 


Proof :  Let  1  <  y  <  oo  and  let  x  =  g  *(y)  +.  Then  g(x)  >  y  and 

-  log(l  -  F(x))  <  -  log(l  -  /'"(g~X(g/))) 
log  g(x)  ~  log  y 

Since  as  y  — ►  oo,  x  chosen  in  this  fashion  goes  to  F~ x(l)  it  follows  that 

lim  inf  -  Ml  -  m  <  li[n  -  Mi  -  , 

*— *-1(1)  logff(x)  j,-*oo  logy 
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If  we  fix  x  e  (g-1(l+),  F  1(1))  and  let  y  =  g(x)  then  g  *(y)  <  x  and 


-  log(l  -  F(x))  >  -  log(l  -  F(q  %))) 
log  g{x)  ~  log  y 

As  x  —*  F~1(l),  y  chosen  in  this  fashion  goes  to  oo  and  thus 
lira  inf  >  lirainf 


um  mi  .  /  v 

logg(x) 


y— -oo 


logy 


We  have  now  shown 


lirainf  ~ Ml  ~  (M)  =  lin, inf  ~ M1  ~ Zfe  ‘MM 
x->F~ >(1)  log  g[x)  y— oo 


Since 


V)))  _  lirainfi 

I— *oo  log  X  x— .00 


lim  inf  • 

X— *-oo 


logy 

Iog(l  -  F(g~l(x+))) 
logx 

logP{g(X)  >  x} 
logx 


the  contention  follows  from  theorem  2.2.1.  1 


Corollary  2.2.4.  For  an  arbitrary  random  variable  X  with  distribution  function  F  let 
a+  =  lirai„f lira  inf 

x—*oo  log  X  x—+oo  *Og  X 


and  a  =  min(a+,a_).  If  0  <  r  <  a  then  E[|X|r]  <  oo.  If  r  >  a  then  E[|X|r]  =  oo.  If  r  =  a  >  0 
then  either  E[|Xjr]  =  oo  or  E[|X|T]  <  oo  may  hold. 


Corollary  2.2.5.  For  an  arbitrary  random  variable  X  with  distribution  function  F[x)  let 


,.  .  , -log(l  -F(x))  ,.  .  , -log F(-x) 

«2  =  lim  mf - — - and  Qi  =  —  lim  inf - . 

x — foo  X  x—oo  X 


Ift£  (ax,  <*2)  then  E[etx]  <  00.  If  t  <  ai  or  t  >  a2  then  E[etx] 


txi  _ 


Note  that  x  is  in  the  denominator  of  the  functions  defining  ai  and  a2  whereas  logx  is  in  the 
denominator  when  defining  a+  and  a_.  The  values  a 4.  and  a_  are  equally  useful  when  considering 
the  existence  of  moments  of  order  statistics. 


Theorem  2.2.6.  Define  a+  and  a_  as  in  corollary  2.2-4.  U  i  >  fja^  and  n  —  i  +  1  >  r/a+ 
then  E[|Xt:„|r]  <00.  If  i  <  r/a_  or  n—  i  +  1  <  r/a+  then  E[|Xf:njr]  =  00. 


Proof:  First  assume  n  —  i  +  1  <  r/a+.  Since 


P{|Xf;„|r  >  x}  >  P{Xf!n  >  xl'r}  >  (1  -  F(x1/r))n-'+i 
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and  since 


lim  inf 

*—►00 


-log((l-F(x1Ar-+1)  =  n  —  t  +  1  lim  —  log(l  —  F(x))  <  J 
log*  r  S-.00  log* 

it  follows  from  theorem  2.2.1  that  E[|XJ.n|]  =  00.  For  t  <  r/a_  the  proof  is  analogous. 
Now  assume  r/a_  <  i  and  r/a+  <  »  —  t  +  1.  We  have 

iogp{ixi:nr  > 

logP{X,-:n  >  x1/'}' 


lim  inf  ■ 

x— *-oo 


log  x 


^  .  (,.  .  „-logP{X,:»  <  -*1/r}  , 

>  mm  I  lim  mf - - - ,  lim  mf  - 

y  x— *oo  log  X  x— *oo 


log* 


Since 


P{Xi:n  >  *1/r>  <  Q(1  -  F,(*1/r))n-'+1, 
the  inequality  in  (2.2.1)  can  be  switched  in  this  case  to  >  and  we  have 

j„f  - 'OS  p{X, .>.''■}>! 


lim  i... 

x— *-oo  log  X 


Similarly, 


limlnf  >  1 

x— *00  log  X 

and  from  (2.2.2)  and  theorem  2.2.1  the  contention  follows. 


(2.2.1) 


(2.2.2) 


§2.3  Convergence  of  moments  of  standardized  quantiles. 


We  are  now  prepared  to  address  the  questions  of  interest.  The  proof  of  the  most  general 
moment  convergence  result  for  quantiles  is  analytic  and  somewhat  tedious  in  nature.  The  proof  of 
convergence  does  not  use  uniform  integrability.  A  proof  using  uniform  integrability  would  require 
analysis  similar  to  that  given  below.  The  proof  of  the  necessity  of  tail  conditions  such  as  (2.1.1)  for 
the  existence  of  moments  of  standardized  quantiles  will  now  become  trivial  in  many  cases.  Following 
is  the  most  general  result  concerning  ‘necessity’  that  will  be  given.  It  is  a  variation  of  theorem  2.2.6. 
In  applications  we  will  take  n  >  1  and  (3  (defined  in  the  theorem)  to  be  \fn. 


Theorem  2.3.1.  Suppose  g  is  a  non-decreasing,  non-negative  function,  and  Xi,  X%, . . .  are  inde¬ 
pendent  identically  distributed  random  variables  with  distribution  function  F  where 
F~ *(1)  =  00.  Suppose  further  that 


lim  inf 

X— +OO 


-  log(l  -  F(x)) 
logg(x) 


0. 
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Then  for  any  (i  >  1,  real  i\,  and  positive  integers  n,  i,  1  <  i  <  n,  it  follows  that 


E[ff(/?(Xi:n-i,))]  =  oo. 


Proof :  Since  for  large  x 


P {g(0{Xi:n  -  r,))  >x}>  P {g(Xi:n)  >  x } 

>(l-F(g-1(x+))r~i+1 


it  follows  that 


lim inf -’!))>»)  <(„_i+1) „m  tot  zM1  “  F(«  ‘(*+)>) 


logx 

From  the  proof  of  corollary  2.2.3  we  can  see  that- 


logx 


lim inf  =ku  -log(l-fW) 

x— oo  logx  x— oo  log  g[x) 

and  the  proof  is  completed  by  the  application  of  theorem  2.2.1.  | 


We  will  now  address  the  problem  of  convergence.  It  will  be  useful  to  label  the  following 
assumptions: 

i.  g  is  a  finite,  continuous,  non-decreasing,  non- negative  function  defined  for  all  real  numbers. 

ii.  g  is  bounded,  or  there  exist  /?,  x<j  >0  such  that  if  t  >  1 ,  x  >  xo  then  log  g(tx)  <  tf3  log  g(x). 

...  ,.  ~  log(l  -  F(x))  ^  A 

in.  hm  inf - - - 7— r -  >  0. 

x—00  log  g(x) 

iv.  c  E  (0, 1),  an—cn  +  0(1),  c„  =  an/n  =  c+  0(l/n). 

v.  F(q)  =  c,  £F(x)  U=,=  f(g)  >  0. 

We  have  chosen  assumptions  i  and  ii  to  make  the  proofs  of  our  general  results  simple. 
The  functions  g  in  which  we  are  interested  are  /[ y,oo)(x)  (this  does  not  satisfy  condition  i,  but  this 
problem  can  be  overcome  by  smoothing),  xr/[o,oo)(a:))  r  =  1,2, ...,  and  etx.  Clearly  etx  for  t  >  0 
and  J[V)00)(x)  satisfy  ii.  To  show  that  xT/[o|00)(a:)  satisfies  ii  we  let  x  >  xo  >  e.  For  t  >  1  we  define 
hi(t)  =  rlog(tx).  Then  /ij(l)  =  rlogx  and  for  t  >  1,  h\{t)  =  r/t  <  r.  Now  define  h%(t)  —  rt  logx. 
Then  /i2(l)  =  rlogx  and  for  t  >  1,  h'2{t)  =  rlogx  >  r.  These  facts  imply  that  if  t  >  1  then  hi(t) 
<  hi(t).  It  now  follows  that  if  xq  >  e  then  ii  holds  for  xr7[0)OC))(x).  More  discussion  on  assumption 
ii  will  be  given  in  section  2.4. 
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To  avoid  complicated  formulas  for  cases  with  F  not  continuous  we  use  the  representation 

E  [g{Vn(Xan!n  -  q))]  =  g{y/n(F~l(u)  -  q))n(™  -  w)n-“»du.  (2.3.1) 

Using  Stirling’s  formula  n!  =  \p2a nn+1/2e-ne9"/(12n)  where  0  <  6n  <  1  (see,  e.g.,  Renyi  (1970), 
p.  149  ff.)  and  letting  cn  =  c  -f  0(1 /n)  as  in  iv  it  is  a  straightforward  calculation  to  show  that  as 
n  — ►  oo 

(a”  -\)  =  (V^rXc-nC"+1/2(l  -  c„)-"(1-c»)-1/2(l  +  0(l/n)).  ,(2.3.2) 

Thus  uniformly  for  u  E  (0, 1) 

,n(l-c„) 


~  (2'3'3) 


Let 


p(u,  v)  —  v(log  v  —  log  u)  +  (1  -  t>)(log(l  -  v)  —  log(l  -  It)). 


(2.3.4) 


In  the  following  we  let  logO  =  -oo  and  e-00  =  0.  From  this  and  (2.3.1)— (2.3.4)  it  follows  if  g  is 
non-negative  and  cn  =  c  +  0(l/ra)  then  as  n  — ►  oo 


EWV^*..:»-*))]~jf  ^ 


TtC  1 

2-k{1~~  c  j  u  CXP  ^°g  ~  7))  -  »?(«>  cn))du.  (2.3.5) 

The  proof  of  the  following  lemma  contains  the  key  ideas  needed  to  show  convergence  of  expectations 
of  functions  of  standardized  quantiles. 

Lemma  2.3.2.  Under  assumptions  i — iv  if  e,b  >  0  and  7  <  1/4  then 

f1  1 

0=  lim  /  nb—  exp  (log  gfy/nfF-1^)  —  q))  —  np(u,  cn))du. 

n— *00  J c+en-i  U  ' 


Proof:  We  choose  an  arbitrary  e  >  0.  Without  loss  of  generality  we  assume  7  E  (0,1/4)  and 
q  <  F-l(c). 

First  we  will  show  that  the  integrand  goes  pointwise  to  zero.  We  will  consider  a  Taylor’s 
series  approximation  of  p  in  the  neighborhood  of  (c,  c).  The  first  and  second  partial  derivatives  of  p 
are 


dp 

—  =  log  v  -  log  u  —  log(l  -  v)  +  log(l  -  u), 


dp  —v  1  —  v 


(2.3.6) 


du  u  \  —  u’ 


(2.3.7) 


IS 
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d2p  1  1 

dv 2  v  1  —  v 


>  0, 


d2p 
du 2 


v  1  —  w 
m2  (1  —  u)2 


>  0, 


and 


d2p  _  _1 _ 1 

dudv  u  1  — « 


Using  the  first  order  Taylor’s  series  expansion  with  remainder,  it  follows  from  the  above  that  for 
some  0i(u,  v)  in  the  closed  interval  between  c  and  u  and  02[u,  v)  in  the  closed  interval  between  c  and 


v 


p(u,  v) 


(m  -  c)2  f  02(u,v ) 
2  \0l(u,v) 


1  -d2{u,v)  \ 
(1  -  0i(u,v))2J 


+  (JLUZf  _J_  + _ I _ \ 

2  \02(u,v)  l-02(u,v)J 

(u  c)(u  C^0^U)V^  +  i  _ 


Since  cn  —  c  +  0(l/n)  and  p(u,  v)  (from  (2.3.7))  is  increasing  in  u  for  u  >  v  we  have  for  some  Ni, 
all  n  >  ATj,  and  u  G  [c  +  evT'1 , 1) 


p{u,cn)  >  cc2n  27/4. 

Thus  for  n  >  Ni  the  integrand  is  bounded  by 


c  1  exp(61ogn  +  log{7(v'n(J'’  1(u)  —  q))  —  n1  2lce2/ 4).  (2.3.8) 

For  g  bounded  the  result  follows  immediately  from  the  dominated  convergence  theorem  since  (2.3.8) 
goes  pointwise  to  zero  and  since  if  n  >  N\  this  may  be  bounded  by  some  constant  for  all  u.  Similarly, 
the  result  now  follows  for  the  case  with  q  =  F~ x(c)  =  F~l(  1). 

Now  assume  that  g  is  not  bounded  and  F_1{  1)  =  oo.  Let  Co  =  sup,,^^  cn.  We  suppose 
N i  is  sufficiently  large  so  that  Co  <  1.  It  follows  from  (2.3.6)  that  if  n  >  Ni  and  u>  cq  then 

p(u,  cn)  >  p{u,  c0).  (2.3.9) 

From  the  assumption  that  lim  infI_,00(— log(l  —  F{x))/  logg(x))>  0  (assumption  Hi)  we  see  that 
there  exist  €  (co,  1),  q i  >  0  such  that  if  u  G  (tij,  1)  then 

-  log(l  -  u)  >  rj i  log  p(F_1(u)). 

From  (2.3.4)  we  see  that  there  exists  r/ 2  >  0  such  that  for  u  G  («i,  1) 

P(«,  cq)  >  log(l  -  u). 
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Thus  for  u  €  («i,  1)  and  r\  —  r\\i\?.  >  0 

p(«,c0)  >  ?/log( <;(/’“ !(u))).  (2.3.10) 

We  now  assume  in  addition  to  the  above  that  tzt  is  so  large  that  7r_1(u1)  —  q  >  x0  where  aj0  is  as 
in  it.  We  let  fc  >  1  (this  is  needed  below  because  q  may  be  less  that  zero)  be  such  that  if  u  £  (uj,  1) 
and  n  >  Ni  then 

log  g(y/n(F~1(u )  -  q))  <  ks/nfi  log  g{F~l[u)).  (2.3.11) 

From  (2.3.9) — (2.3.11)  we  may  now  bound  the  integrand  of  the  lemma  for  u  €  («i,  1),  n  >  Nl  by 

c_1  exp  (6  log  n  +  {ky/n/3  —  nq)  log  g(F  *(u))). 

Given  this  bound  we  may  now  apply  the  dominated  convergence  theorem  to  the  integral  on  the 
interval  (wj,  1). 

If  F  1  (1)  <  oo  and  q  <  F  *(1)  let  =  1.  Otherwise  let  U\  be  as  in  the  previous 
paragraph.  Let  x0  be  as  in  it.  Let  N2  >  Wj  be  such  that  y/N2(F~1(ui)  -  q)  >  x0.  Assumption  ii 
then  implies  that  for  n  >  N2  and  c  +  en-7  <  «  <  ui 

log^^F-^M)  -  q ))  <  log g{r/n( F*1^)  -  q)) 

<  flV^loggiy/NtiF-'iuJ-q)), 

which  with  (2.3.8)  implies  that  the  integrand  is  bounded  on  [c  4-  en-7,^]  for  n  >  by 

c-1  exp  ^fclogra  +  P\fn/ N2  log  g{\/N2{F~1(ul  )  —  q))  —  n1  27ce2/4). 

It  follows  that  the  integral  on  (c  +  en~ 7,«i]  goes  to  zero  as  n  — ►  oo  by  the  dominated  convergence 
theorem,  and  the  proof  is  complete.  | 

One  more  result  is  needed  to  show  convergence  of  moments.  We  could  choose  to  show 
uniform  integrability  of  g(y/n{Xan,n  -  q))  and  then  use  Wretman’s  (1978)  result  for  convergence  in 
distribution.  However,  it  is  almost  as  easy  to  show  convergence  of  moments  directly  using  lemma 
2.3.3  given  below.  The  constant  2/9  of  the  lemma  is  arbitrary.  The  proof  holds  when  this  value  is 
replaced  by  any  number  between  1/5  and  1/2.  A  constant  less  than  1/4  is  needed  to  use  the  lemma 
in  conjunction  with  lemma  2.3.2. 

Lemma  2.3.3.  Assume  conditions  i,ii,  and  iv  hold,  k  >  0,  and  Y  ~  /V(0,  k2c(l  —  c)).  Then  for 
any  e  >  0  as  n  — >  oo 

/.c+e/n3/®  /  _  1  \ 

£n(e)  ==  /  g(ky/n(u  -  c))n(  ”  )w0n_1(l  —  u )n~andu  -*  E[g(y)]. 

J  c— t/n2/9  \®n  —  1/ 


u 
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Proof :  Assume  k  —  1.  By  applying  equation  (2.3.3)  and  letting  z  —  \fn{u  —  c„)  we  see  that 
rcn5/*8  +  v^(e-e n)  —  '  °»_1 


■«>~£ 


g(z+  v/n(c„  - 


-en6/la+Vn(c— Cn)  y/2nc(l 

We  will  use  the  Taylor’s  series  expansion 


(, — i_v 

-«))  V  vW  V  V^>(1  -  cjJ 


X2  X3  X4 


Ml  +  *)  -  —  y  +  y  -  T  +  5(r+  «(„))» 

which  is  valid  for  —1  <  x  <  1  and  some  6(x)  between  0  and  x.  Let 

(1  —  2c)a3  (1  —  3c  +  3c2)z4 

Wn[2)  =  3^(1  -  c)2  ~  4nc3(l  —  c)3  ' 

It  is  a  straightforward  calculation  to  show  that  as  n  — ►  oo 


max 

\z\<  CnB/18  +  \/”|c“"cn 


log  1  + 


-=-r  fi — - — y 

v/nCny  \  %/^C1  -  cn)/ 


+ 


2c(l  —  c) 


-  u>„(z) 


0. 


Thus 


„  &/  IB 

pen  ' 

^  Jen*/  18-t 


£n&/1B  +  v/n(c— cn) 


g(j  +  Vn{yc)]  exp  (_22/(2c(1  _  c))  +  Wn(z))dz. 

£n6/18  +  x/"(<:— e.)  -^/27rc(l  —  c)) 

Since  cn  —  c  +  0(1 /n)  assumption  *  implies  g{z  +  y/n(cn  —  c))  — >  g(z)  for  each  fixed  z.  Assumption 
ii  implies  that  g  grows  at  most  exponentially.  Since  z/y/n  —  o(l)  uniformly  for  all  z  in  the  range 
of  integration  it  follows  that  wn(z)  =  o(z2)  uniformly  in  this  range.  These  facts  imply  that  the 
integrand  above  can  be  bounded  by  fcj  exp(—  k%z2)  for  some  fc2  and  n  large.  Thus  by  the 
dominated  convergence  theorem  £n(t)  — ♦  E[p(F)]. 

For  fc^lwe  let  gi(x)  =  g(kx)  and  apply  the  result  for  A;  =  1  to  gi  since  i,  ii,  and  iv  are 
still  satisfied.  | 


We  are  now  prepared  to  prove  the  most  general  convergence  result  which  we  shall  present. 


Theorem  2.3.4.  If  conditions  i — v  are  satisfied  and  Z  ~  AT(0,  c(l  —  c)/f2(q)),  then  as  n —*  oo 

E[<7(v^<Xa„:„  -  ?)))  -  Efo(2r)]. 

Proof :  From  (2.3.1),  (2.3.3),  and  lemma  2.3.2  and  its  obvious  analog  it  follows  that  for  any  £  >  0, 

rc +e/n2t*  ,  _  .  \ 

E [g(yfr(Xan:n-q))}~  9(MF-'(u)-q))nl  n  ^"-‘(l 

Je-e/n a/»  V»n  ~  V 

For  any  6  >  0  there  exists  €  >  0  such  that  for  u  G  (c  —  £,  c  +  e) 

(i  ~ w  <  F~1{u)  ~ q<  (i + (2312) 

The  result  now  follows  from  lemma  2.3.3  by  substituting  the  two  bounds  into  the  integrand  above 
and  noting  that  S  >  0  was  arbitrary.  | 
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We  will  now  consider  applications  of  theorems  2.3.1  and  2.3.4.  For  propositions  2.3.5 — 2.3.9 
assume  that  iv  and  v  hold,  and  that  Z  —  7V(0,  c(l  —  c)//2(g)). 

Proposition  2.3.5.  Aa  n  — *  oo,  \/n(Xan:n  —  q)  converges  in  distribution  to  Z. 

Proof:  We  will  show  that  for  xi  arbitrary  P {y/n[Xan,n  —  q)  >  ij}  -»  P {Z  >  a^}  as  n  -»  oo.  For 
e  >  0  we  let 

g£(x)  —  I[Xl,oo){X )  +  (1  +  (*  —  —  £,*!](*)• 

and  apply  theorem  2.3.4  to  ge.  By  letting  e  — ►  0  we  obtain 

limsupP{v/n(Xan:„-g)  >  n}  <  P{Z  >  xrf. 

n — ►oo 

In  a  similar  fashion  we  can  show 

lim inf P {y/n(Xan:n  —  q)  >  xt}  >  P {Z  >  a?i}, 

n— ►oo 

and  the  proof  is  complete.  | 

Proposition  2.3.6.  Define  a+  and  a_  as  in  corollary  2.2. If  r  is  a  positive  integer  then 
E[(V^(Xanin-g)n->E[^]  as  n  — *  oo  if  and  only  if  a+  >0  and  a_  >  0. 

Proof  :  The  necessity  of  a+  >  0  and  a_  >  0  follows  from  theorem  2.2.6  or  theorem  2.3.1. 

Now  assume  a+  >  0  and  a_  >  0.  Letting  g(x)  =  xrI{x>0}(x)  it  follows  from  theorem 
2.3.4  that  E[g(\/n(Xan:n  —  g))]  — <■  E [g{Z)].  Letting  Y{  =  —Xi,  bn  =  n  +  1  —  an  it  follows  that 
E[g(\/n(Y|,n:n  —  g))]  — »•  E \g{Z)].  The  result  now  follows  since 

E[(-/n(X0„:n  -  g))r]  =  E[g(%/n(Xa„;n  -  g))]  +  (-l)r£'[g(v/n(F6n:n  -  g))]. 


Proposition  2.3.7  below  follows  from  proposition  2.3.6  and  corollary  2.2.4.  Proposition  2.3.8 
follows  from  theorems  2.3.1  and  2.3.4.  Proposition  2.3.9  follows  from  proposition  2.3.8  and  corollary 
2.2.5. 

Proposition  2.3.7.  If  r  >  0  then  E[('y/npC0>i:n  —  g))r]  — *  E[Yr]  as  n  — >  oo  if  and  only  if  there 
exists  S  >  0  such  that  E[|Xi|fi]  <  oo. 
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Proposition  2.3.8.  Define  aq  and  a2  as  in  corollary  2.2.5.  Then  E [exp(f •v/n( JsTa n ;n  —  q))]  — * 
E[etz]  for  all  t  >  0  if  and  only  if  a2  >  0.  Similarly  E[exp(fi/n(Xan;n  —  q))]  — ►  E[etz]  for  all  t  <  0 
if  and  only  if  a*  <  0. 

Proposition  2.3.9.  For  all  t  >  0  E[exp(f  v/n(Xait;n  —  9))]  —*  E[etzj  if  and  only  if  there  exists 
e  >  0  such  that  E[ecXl]  <  00.  Similarly  E[exp(iv/n(Xan!„  —  9))]  — »  E[et,z]  for  all  t  <  0  if  and  only 
if  there  exists  e  >  0  such  that  E[e~tJft]  <  00. 


In  the  next  section  we  will  note  some  relations  that  sometimes  simplify  applications  of 
the  above  propositions.  These  arguments  and  propositions  2.3.6  and  2.3.7  imply  that  moments  of 
standardized  quantiles  from  all  commonly  considered  distributions  converge.  Propositions  2.3.8  and 
2.3.9  imply  that  the  moment  generating  functions  of  standardized  quantiles  from  distributions  with 
exponential  tails  converge  to  the  moment  generating  function  of  a  normal  distribution.  They  also 
imply  that  moment  generating  functions  for  standardized  quantiles  for  distributions  such  as  the 
Cauchy,  Pareto  and  slash  distributions  do  not  converge;  in  fact  from  theorem  2.3.1  it  follows  that 
they  do  not  exist  for  any  n. 

Before  closing  this  section  we  will  extend  our  result  on  convergence  of  moments  of  quantiles 
to  finite  linear  combinations  of  quantiles.  We  will  need  the  following  lemma  which  is  a  trivial 
extension  of  theorem  4.5.2  of  Chung  (1974). 

Lemma  2.3.10.  If  Yn,  n  —  1,2, ...  converges  in  distribution  to  X,  and  for  some  p  >  0, 
lim  supn^Qo  E[|yn|p]  =  M  <  oc.  then  for  each  r  <  p 

lim  7?[|r„n  =  E[\Y\r]  <  00. 

n— *-oo 

If  r  is  a  positive  integer,  then  we  may  replace  |Y’n|r  and  |yr|  above  by  Yrn  and  Yr ,  respectively. 


Proposition  2.3.11.  Let  a+  and  ot _  be  as  in  corollary  2.2.4 •  Suppose  F  is  a  cdf,  0  <  ci  < 
...  <  c*  <  1,  Cf  =  F(qi),  and  (dj dx)F(x)  j x—qt  —  f{<li)  >  0,  1  <  t  <  k.  Suppose  further  that 
ain  =  nct-  +  0(1),  1  <  i  <  k.  Finally,  let  Zi,  1  <  t  <  k  have  a  multivariate  normal  distribution 
with  E [Zi]  =  0,  E [ZtZj]  =  0,(1  -  cMfMfiqi)), 1  <  i  <  j  <  k.  Then  for  any  finite  constants 
bi,  1  <  t  <  k,  r  =  1, 2, . . . 


lim  E 

n— >oo 


(£  -  q<)j  =  E  (E  b<Z^j 


if  and  only  if  a  +  >0  and  a_  >  0. 


Proof:  If  a+  =  0  or  a_  =  0  it  follows  from  theorem  2.2.6  that  E[|  X,-in  |r]  =  00  for  1  <  t  <  n  < 
00.  Thus  the  necessity  of  a+  >  0  and  a_  >  0  is  established. 


2.4.  Remarks 
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Assume  a+  >  0  and  a_  >  0.  First  we  wish  to  show  that  the  vector 

V™[Xa ,„:n  —  9i,  •  •  ■,  Xakn:n  -  qk)  converges  in  distribution  to  (Zu . . .,  Zk).  The  proof  of  this  is  the 
same  as  that  of  David  (1980),  pp.  255-257,  for  a  result  with  F  continuous  everywhere.  The  basic 
tool  used  in  the  proof  is  the  result  of  Ghosh  (1971)  on  Bahadur  representation.  From  the  proof  of 
proposition  2.3.6  we  know  that  for  1  <  t  <  Jfc 


limsup  E[y/n\(Xaintn  -  ?t)|r]  =  E[\Zi\T\.  (2.3.13) 

n— +-oo 

Applying  the  Minkowski  inequality  repeatedly  we  have 


lim  sup 


(E 


E  6tV^(Xfl.„:n  -  qi) 


t=l 


ir 


<  limsup  J2  lfe*l(E[|v^(X0<n!n  -  7t)r])1/r 

r»-»oo  t=1 

k 

<  Y  lfe«llimsup(FJ[v/n|(X0in!n-qi)|r])1/r. 

t=l  n->co 


The  proposition  now  follows  from  (2.3.13)  and  lemma  2.3.10.  | 


§2.4  Remarks 

In  this  section  we  will  discuss  assumptions  it  and  Hi  of  the  previous  section  and  their 
implications.  We  will  also  note  the  local  conditions  on  F  which  have  been  used  by  other  authors  to 
show  convergence  of  moments  of  quantiles. 

We  define 

and  =  (2.4.,) 

i— «oo  log  X  x — ►oo  X 

If  a  >  0  then  for  any  X  G  (0,  a)  there  exists  x\  such  that  if  x  >  x\  then  1  —  F(x)  <  x~x.  Similarly, 
if  a'  >  0  then  for  any  X  E  (0,  a')  there  exists  x'x  such  that  if  x  >  *'x  then  1  —  F(x)  <  e~Xl.  Bickel 
(1967)  has  used  the  existence  of  some  e  >  0  such  that  lim^oo  xe(l  —  F(x)  +  F(—x))  =  0  to  give 
moment  results  similar  to  (but  less  precise  than)  theorem  2.2.6  and  to  get  convergence  of  moment 
results.  Blom  (1959),  p.  44,  has  used  a  bound  proportional  to  ua(l  —  u)b  (with  a,  b  <  0)  for  |F_1(u)| 
as  a  condition  to  obtain  a  result  similar  to  theorem  2.2.6  for  first  and  second  moments  (this  result 
could  be  extended  to  higher  moments  easily).  The  results  of  section  2.2  showing  when  moments  and 
moment  generating  functions  do  and  do  not  exist  demonstrate  that  defining  the  precise  values  of  a 
and  a'  has  some  utility. 

Another  concept  related  to  a  is  regular  variation.  See,  for  example,  Feller  (1971)  or  de  Haan 
(1970)  for  definitions  and  some  elementary  properties.  The  statement  that  1  —  F(x)  <  kx~a  for 
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some  k,  a  >  0  and  large  x  implies  that  1  —  F(x)  is  bounded  by  a  function  having  regular  variation. 
If  F(x)  —  1  —  x~a  standard  results  in  asymptotic  extreme  value  theory  imply  that  if  0  <  a  <  oo 
then  the  distribution  function  of  Xn-n / nl / a  converges  vaguely  to  l  —  Pickands  (1968)  studied 

asymptotic  behavior  of  moments  of  sample  extremes.  His  basic  result  was  that  if  bnXn:n  has  a 
limiting  distribution  which  has  an  r1*1  moment  then  E[(b„.Xn:n)r]  converges  to  that  value.  This 
implies  that  if  0  <  a  <  oo  where  a  is  as  in  (2.4.1)  then  E[X'.n]  =  0(nT/a). 


We  may  define  a  (and  a')  in  terms  of  the  quantile  function  F  1(u);  that  is 


a  —  lim  inf 

U— 1 


-log(l  -  u) 
log  F-1(tt)  ' 


The  value  1/a  is  what  Parzen  (1979)  refers  to  as  the  tail  exponent  of  f(F  1(u));  i.e.,  if  f[F  *(«)) 
is  regularly  varying  as  u  — ►  1,  then  the  exponent  of  regular  variation  is  1/a. 

The  function  A(x)  =  —  log(l  —  F(x))  is  often  referred  to  as  the  cumulative  hazard  function. 
Its  derivative  X(x)  =  f(x)/(l  —  F(x))  is  the  hazard  rate.  Suppose  for  large  x  that  X(x)  is  bounded 
below  by  fix6  where  /3  >  0.  If  S  >  —  1  then  a  >  0,  and  if  6  >  0  then  a'  >  0.  On  the  other  hand, 
suppose  X(x)  is  bounded  above  for  large  x  by  ftx6  where  /3  >  0.  If  8  <  —  1  then  a  =  0,  and  if  8  <  0 
then  o'  =  0. 


Other  authors  have  used  stronger  local  conditions  than  assumption  v  to  obtain  convergence 
of  moments  of  quantiles.  Sen  (1959)  used  the  existence  of  /'  in  a  neighborhood  of  q  to  obtain  a  higher 
order  approximation  (than  (2.3.12))  of  F  in  that  neighborhood.  Bickel  (1967)  used  the  representation 
F~l(u)  —  q  =  (u  —  c)/ f(x[u))  where  x(u)  is  between  q  and  F~l(u)  and  thus  required  /  continuous 
in  a  neighborhood  of  q. 

When  g  is  unbounded  condition  ii  may  be  written  as 

ii'.  There  exist  /3,x 0  >0  such  that  if  t  >  l,x  >  xq,  then  log  g(tx)  <  t/3  log  g(x). 

We  have  chosen  this  condition  to  simplify  the  proof  of  lemma  2.3.2.  This  inequality  implies  that 
if  x  >  xo  and  t  >  1  then  g(tx)  <  g{x)t!3  which  implies  that  g(x)  grows  at  most  exponentially  for 
large  x.  The  crucial  implication  of  ii'  (which  is  not  implied  by  the  fact  that  g  is  bounded  by  an 
exponential  function)  is  that 


(1/2)  log  n  +  log  g{y/n(x  -  q))  -  n  log  g(x)  -*  — oo 

uniformly  for  x  >  xi  for  some  xi  >  q  as  n  — »  oo;  i.e.,  this  implies  that  log g(,/n(x  —  q))  < 
kt/n log g(x)  for  x  >  xi,  n  large  and  some  k,  which  is  exactly  what  we  need  at  the  end  of  the  proof 
of  lemma  2.3.2. 


Chapter  3 

Expansions  for  moments  of  robust  statistics 


§3.1  Introduction. 

A  theorem  showing  asymptotic  expansions  for  moments  for  a  class  of  robust  statistics  is 
given  in  this  chapter.  Included  in  this  class  are  many  statistics  which  may  be  written  as  a  functional 
of  the  empirical  distribution  function;  i.e.,  they  may  be  written  as  T[F„)  where  Fn  is  an  empirical 
distribution  function  based  on  n  independent,  identically  distributed  (iid)  random  variables  with 
underlying  cumulative  distribution  function  (cdf)  F.  The  basic  result  is  that  the  tail  condition  on 
F  given  in  chapter  2  which  implies  convergence  of  moments  of  standardized  quantiles  is  found  to 
be  sufficient  to  imply  convergence  of  moments  of  y/n(T(F„)  —  T(F))  to  moments  of  expansions  of 
T(Fn)  about  F  obtained  using  a  version  of  Frechct  differentiation.  The  result  gives  higher  order 
approximations  of  moments  if  the  defining  functional  has  higher  order  Frechet  derivatives. 

We  begin  with  two  short  sections  establishing  some  notation  and  definitions.  Then  we 
present  our  basic  theorem  and  its  proof.  All  of  the  theory  of  Frechet  differentiation  needed  is 
presented  in  sections  4  and  5.  In  section  5  we  apply  our  basic  theorem  to  give  general  formulas  for 
first  and  second  order  approximations  to  the  mean  and  mean  squared  error.  Applications  of  these 
results  to  L-  and  M-estimates  are  given  in  the  next  chapter. 

§3.2  Notation. 

Much  of  the  notation  needed  for  the  remainder  of  the  thesis  is  given  in  this  and  the  following 
section.  In  this  section  we  present  some  preliminary  notation  which  will  be  needed  in  our  discussion 
of  functional  statistics  and  Frechet  differentiation.  The  definition  of  Frechct  differentiation  and 
corresponding  notation  will  be  given  in  the  next  section. 
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SR  — the  space  of  real  numbers. 

cdf  — cumulative  probability  distribution  function. 

D  — the  space  of  finite  linear  combinations  of  one  dimensional  cdf’s. 

G  (or  Gi)  — an  arbitrary  element  of  D;  often  a  cdf  or  a  difference  of  cdf’s. 

F  — an  arbitrary  element  of  D;  usually  a  cdf. 

Xi,X2, . . .  — an  iid  sequence  with  cdf  F. 

Sx  — the  cdf  with  mass  one  at  x. 

Fn  =  (1/n)  Sxt  — the  empirical  cdf  after  n  iid  observations  from  F. 

Zi  =  Sx i  —  F'>  a  P -valued  random  variable. 

||  •  ||  — an  arbitrary  norm  on  P. 

||  •  | j oo  — the  sup  norm  on  P;  i.e.,  for  G  G  P,  ||  G  ||oo=  suP_oo<x<oo  I  <?(*)  |* 
Dn  =||  Fn  —  F  | loo,  the  Kolmogorov-Smirnov  statistic  after  n  observations. 


§3.3  Functional  differentiation  and  von  Mises  expansions. 

Definitions  pertaining  to  functional  differentiation  which  will  be  needed  are  collected  in  this 
section.  We  consider  only  Frechet  differentiation  as  we  need  bounds  for  functionals  in  our  proofs 
which  are  simply  provided  using  Frcchet  differentiation.  Theorems  applying  Gateaux  differentiation 
are  not  studied  in  this  work.  Recent  surveys  on  functional  differentiation  and  its  applications  in 
statistics  are  given  by  Reeds  (1976),  Serfling  (1980),  and  Huber  (1981). 

Definition  3.3.1.  Let  P  be  the  space  of  finite  linear  combinations  of  distribution  functions.  A 
functional  Tk{F;G i, . .  .,Gk)  with  F  fixed  which  maps  Dk  into  3?  will  be  said  to  be  k- linear  if 

Tk(F-,G1,...,Gk)  =  ss  hk{F-,xu...,xk)  n  dGi[xi) 

for  some  real  valued  hk(F;xlt . .  .,Xk)  which  is  symmetric  in  xi,  12,  •  •  •>  xk-  We  let  Tk{F\G)  = 
Tk{F,G, . .  .,G).  The  function  hk  is  said  to  be  the  kernel  o/Tfc. 

The  function  hi  is  usually  referred  to  as  the  influence  curve  in  the  literature  of  statistics. 


3.3.  Functional  differentiation  and  von  Mises  expansions. 
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Definition  3.3.2.  Let  ||  •  ||  be  a  norm  on  V.  Suppose  T  is  a  real  valued  functional  defined  on 
7  C  P  where  7  contains  a  neighborhood  of  F  £  D;  i.e.,  there  exists  6  >  0  such  that  G  £  D  and 
II  &  ll<  6  implies  F  +  G  £  7 .  Let  k  be  a  positive  integer.  Suppose  Tj(F;  G  Gy)  is  a  functional 
defined  for  G,  £  D,  1  <  t  <  j,  which  is  j -linear,  1  <  j  <  k.  Let  T0(F,G)  ==  T(F).  If,  for 
0  <  t  <  k, 

Ri(F ;  G)  _  T(F  +  G)  -  £‘=0  Ts (F;  G)/j\ 

II  G  l|*  “  II  G  ||* 

goes  to  zero  as  ||  G  j|  goes  to  zero  then  T  is  said  to  be  k  times  Frechet  differentiable  with  respect 
to  the  norm  ||  •  ||  at  F.  Furthermore,  Tj(F;  ■)  is  called  the  jth  Frechet  differential  of  T  at  F, 
0  <j<k. 

The  usual  candidate  for  Tj(F;  G)  is 

UF;  G )  =  £jT(F  +  hG)  U=o  •  (3.3.1) 

We  will  not  say  much  about  how  to  find  Ty  and  hj;  examples  for  M-  and  L-estimates  are  given  in 
chapter  4. 

In  the  case  it  considers,  the  conditions  of  the  above  definition  are  slightly  weaker  than  those 
of  the  standard  definition  of  Frechet  differentiability  given  by  Reeds  (1976),  p.  151.  It  would  be  more 
appropriate  to  say  that  if  T  satisfies  the  conditions  of  definition  3.3.2,  then  T  has  a  kth  order  Taylor 
expansion  about  F  with  remainder  o(||  G  ||fc).  For  the  sake  of  brevity  we  do  not  do  this.  Because  the 
requirements  are  weaker,  results  which  assume  definition  3.3.2  also  hold  if  the  standard  definition 
of  Frechet  differentiability  is  assumed  instead.  However,  we  do  not  need  to  show  that  the  additional 
conditions  of  the  latter  definition  hold  to  apply  our  results.  We  obtain  the  standard  definition  of 
Frechet  differentiation  by  adding  additional  assumptions  to  definition  3.3.2.  First,  we  must  have  that 
for  some  e  >  0  and  any  H  £  D  with  ||  H  —  F  ||  <  e  the  functional  T  is  k  times  differentiable  at  H  by 
definition  3.3.2.  We  must  also  have  that  Ty(f7;  Gi, . . .,  Gy) - Ty(F;  Gt, . . .,  Gy)  -*  0  as  ||  H  —  F  ||->  0, 
and  that  Tj(H;  G\, . . .,  Gy)  is  uniformly  bounded  for  ||  H  —  F  ||<  e  if  Gi  £  D  and  ||  G,-  ||<  1, 
1  <  i  <  j  <  k.  We  have  not  shown  that  any  ‘interesting’  functionals  are  differentiable  by  definition 
3.3.2  but  are  not  differentiable  by  the  standard  definition.  The  proofs  of  differentiability  given  in 
chapter  4  do  not  appear  to  have  trivial  extensions  to  show  that  the  conditions  of  the  latter  definition 
hold. 

Serfling  (1980),  p.  217,  gives  a  definition  of  first  order  differentiability  comparable  to 
definition  3.3.2.  He  requires  only  that  2j(F;-)  be  defined  on  the  space  of  differences  of  distribution 
functions  rather  than  on  the  linear  space  generated  by  distribution  functions.  This  addition  to  the 
domain  of  definition  is  found  to  be  useful  in  lemmas  3.4.4  and  3.5.1  below.  We  can  prove  the  same 
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results  given  here  if  T  is  only  defined  for  distribution  functions  by  requiring  that  the  results  of  these 
lemmas  hold.  The  given  definition  allows  us  to  develop  a  slightly  more  pleasing  theory  and  does  not 
really  cost  us  anything  in  terms  of  our  particular  applications. 

Frechet  differentiation  is  only  one  type  of  functional  differentiation.  Other  notions 
of  functional  differentiation  which  are  useful  in  statistics  are  referred  to  as  Gateaux  and  compact 
differentiation.  These  definitions  differ  in  their  requirements  on  the  remainder  term,  with  Frechet 
differentiation  having  the  strongest  requirement. 

See  Reeds  (1976)  for  more  discussion  on  many  of  the  above  matters. 

Definition  3.3.3.  Assume  Xi,X2, ...  are  iid  F  and  let  Fn  denote  the  empirical  cdf  of  the  first  n 
observations.  Suppose  the  domain  of  definition  of  T  includes  all  empirical  cdf’s.  Then  the  random 
variable  T{Fn)  will  be  referred  to  as  a  functional  statistic.  The  expansion 

k  k 

T(Fn)  =  J2  TAF>  F »  “  F)H-  +  R^F>  Fn~F)=I2  +  Rk,n 

j—o  i=o 

will  be  referred  to  as  the  (klh  order)  von  Mises  expansion  of  T(Fn).  The  random  variable  Rk,n 
will  be  referred  to  as  the  remainder  term  of  the  expansion. 

The  name  von  Mises  expansion  derives  from  the  pioneering  work  of  von  Mises  (1947)  in 
the  application  of  functional  differentiation  to  statistics.  When  Fn  -  F  is  replaced  by  an  arbitrary 
G  €  D  the  expansion  is  also  referred  to  as  a  Taylor’s  series  expansion. 

The  primary  applications  of  functional  differentiation  in  statistics  have  been  to  approximate 
functional  statistics  using  von  Mises  expansions,  and  then  to  extend  results  for  these  approximations 
(which  are  usually  easy  to  obtain)  to  the  functional  statistic.  Some  typical  results  obtained  are 
extensions  of  the  central  limit  theorem,  of  the  law  of  the  iterated  logarithm,  of  the  theory  of 
Edgeworth  expansions,  and  of  the  Berry-Esseen  theorem  (see  Reeds  (1976)  and  Serfling  (1980)). 
As  an  example  of  an  application  which  we  will  use  we  consider  the  following  central  limit  theorem 
which  is  very  similar  to  results  given  by  Boos  and  Serfling  (1980)  and  Serfling  (1980). 

Theorem  3.3.4.  Suppose  F  is  a  cdf  and  T  is  defined  on  7  which  contains  F  and  all  empirical 
cdf’s.  For  G  €  D  let  ||  G  ||oo  =  sup..^*^  |  G{x )  j.  Suppose  that  T  has  a  Frechet  differential  Ti 
at  F  with  respect  to  ||  •  ||oo  which  is  not  identically  0.  If  0  <  <r2  =  E[(Ti(F;  Sxt  —  F))2]  <  oo,  then 
s/n(T(Fn)  —  T(F))  converges  in  distribution  to  N{0,  cr2)  as  n  —*  oo. 


Proof:  Since  ||  Fn  —  F  ||oo=  Op(n-1/2)  we  have  from  the  definition  of  Frechet  differentiability  that 

T{Fn)  -  T{F)  -  Ti(F\  Fn  -  F)  +  op{n~ll2).  (3.3.2) 
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Since  Ti  is  linear  in  its  second  argument  and  Fn  —  F  =  -  F)/n  it  follows  that 

Ti(F;  Fn  -  F)  =  ±  ^  -  F). 

n  i= 1 

By  the  assumption  that  E[(Ti(F;  -  F))2]  <  oo  it  follows  that  E [Ti(F;6Xl  -  F)]  is  well  defined. 
From  definition  3.3.1  we  see  that  E[2i(F;  SXl  -  F)]  =  E [fcjfFjXx)]  -  f  h^F;  x)dF(x)  =  0.  Thus  by 
the  central  limit  theorem  for  iid  random  variables  with  finite  variance  %/nTx(F;  Fn  —  F)  converges 
in  distribution  to  N( 0,  cr2).  The  contention  now  follows  from  this  and  (3.3.2).  | 


§3.4  A  general  moment  result. 


In  this  section  we  will  state  and  prove  a  result  which  may  be  applied  to  show  moment 
expansions  for  a  wide  variety  of  robust  statistics.  We  begin  by  stating  the  basic  theorem. 


Theorem  3.4.1.  Suppose  X i ,  X% , . . .  are  iid  with  cdf  F .  Let  the  empirical  cdf  oj  Xi,  1  <  i  <  n, 
be  denoted  by  Fn.  Let  Xi:n,...,Xn:n  denote  the  order  statistics  of  Xit  1  <  *  <  n.  Suppose  T 
satisfies  the  following  three  conditions: 

i.  T  is  defined  on  7  C  D  where  7  contains  F  and  all  empirical  cdf’s. 

ii.  T  is  k  times  Frechet  differentiable  at  F  with  respect  to  the  sup  norm  ||  •  H^. 


iii.  There  exist  constants  8,  r\,  N ,  and  m,  all  greater  than  zero,  and  c  6  (0,  1/2),  such  that  for  all 
n  >  N  if  an  <  cn  and  bn  >  (1  —  e)n  then 

I  T(Fn)  |<  7,(1  X0„:n  |  +  |  X6„:n  |  +2 S)m. 

Suppose  F  satisfies 

iv.  a+  =  lim  inf  — ^ ^  >  0  and  a_  —  lim  inf  — log  Zl  >  q. 

i— oo  logx  z— oo  log  X 


Recall  that  Tj<n  =  Tj(F;F„  — F)  is  the  jth  differential  of  T  at  F  in  the  direction  Fn  —  F. 
Under  assumptions  i- -iv,  for  any  positive  integer  r 


and 


E[(2'(F„)  —  T(F))r}  — 


(15[|r(Fn)  -  T{F)\T])i/r 


(3.4.1) 


+  o{n~kl 2). 


(3.4.2) 
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n _ 

Remarks. 

1)  If  we  replace  the  inequality  of  tit  with 

Xan:n~S  <T(Fn)<X„,in  +  S 

and  iv  does  not  hold,  then  E[|  T(Fn)  —  T(F)  |r]  =  oo  by  theorem  2.2.6.  Thus  iv  is  a  necessary 
condition  in  this  case. 

2)  We  will  argue  (not  rigorously)  in  section  3.5  that  if  T  is  k  +  1  times  Frechet  differentiable  then 
the  remainder  term  in  (3.4.1)  should  be  0(n~ I(r+fc+1)/2! )  where  [[a]j  denotes  the  greatest  integer 
less  than  or  equal  to  a.  The  general  method  of  approximating  E[(2  T’j,n)r]  will  also  be  given  in  the 
section  3.5.  The  remainder  of  this  section  consists  of  a  series  of  lemmas  which  will  be  used  to  prove 
theorem  3.4.1.  The  fundamental  lemma  to  be  used  is  the  following: 


Lemma  3.4.2.  Suppose  Xi,  X%, . . .  are  iid  with  cdf  F.  Let  T  be  a  functional  which  is  defined  on 
7  C  P  where  7  contains  F  and  all  empirical  cdf’s.  Suppose  also  that  T  satisfies  conditions  i — Hi 
and  F  satisfies  iv  of  theorem  3.4-1-  Let 

Rk,n  =  T(Fn)-T(F)-'£iTj,n/j\ 

}=i 

be  the  remainder  of  the  A;1*1  order  von  Mises  expansion.  Then  for  any  positive  integer  r 


E[(n*/2  |  Rk<n  m  -  0  os  n  -  oo. 

We  delay  the  proof  of  this  lemma  until  some  preliminary  results  have  been  established.  The 
first  two  lemmas  are  the  keys  to  proving  our  main  results.  Lemma  3.4.3  was  first  given  by  Dvoretsky, 
Kiefer,  and  Wolfowitz  (1956).  It  is  very  useful  for  obtaining  uniform  integrability  results  when  using 
Frechet  differentiation  with  respect  to  ||  •  Hoc.  Lemma  3.4.4  gives  bounds  for  the  differentials  Tk(F;  •) 
which  will  be  needed. 


Lemma  3.4.3.  Let  Dn  HlfWIUo  be  the  Kolmogorov-Smirnov  statistic  after  n  iid  observations 
from  F.  Then  there  exists  cj  >  0  such  that  for  all  n  and  x 


P{\/n.Dn  >  a:}  <  cxe  21 
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Lemma  3.4.4.  If  T  is  k  times  Freehet  differentiable  at  F  with  respect  to  the  a  norm  [|  •  ||  on  D, 
then  there  exists  A  >  0  such  that  for  any  G  £  D,  |  Tk(F;  G)  |<  A  ||  G  ||*. 

Proof:  Let  S  >  0.  From  the  differentiability  assumption  we  know  that  there  exists  es  >  0 
such  that  if  ||  G  ||=  es  then  |  Rk(F;G)  |<  6  ||  G  ||fe  and  |  R^^F-yG)  |<  S  ||  G  ||fc_1.  Note 
that  this  holds  even  for  k  =  1.  Since  Tk(F;G)  =  Rk-i(F;  G)  -  Rk(F;G)  this  implies  that 
\Tk(F;G)\<S\\G\\k-1  (1+  ||  G  ||). 

Now  suppose  the  contention  is  not  true.  Then  there  exist  Gt-  £  D,  i  —  1,2,...  such 
that  Tk(F,Gi)  >  i  ||  Gi  ||*.  Since  for  any  a  £  SR,  G  £  D  we  have  Tk(F;  oGt)  =  akTk[F]  Gt) 
we  may  assume  without  loss  of  generality  that  ||  Gt-  ||oo—  eg.  Since  for  t  sufficiently  large 
Tk{F;Gi)  >  ^(l  +  c*)  we  have  contradicted  what  we  have  shown  in  the  paragraph  above.  | 

The  following  definition  of  uniform  intcgrability  and  corresponding  result  are  slight  varia¬ 
tions  on  those  given  by  Breiman  (1968),  p.  91.  The  result  here  follows  from  Breiman’s  result. 

Definition  3.4.5.  A  sequence  of  random  variables  Yn,  n  =  1,2,...  will  be  said  to  be  uniformly 
integrable  if  for  any  e  >  0  there  exist  Ae  and  Ne  such  that  for  all  n  >  Ne> 

E[|  y„  |  7{|  Yn  |>  j4«}]  <  e. 

Lemma  3.4.6.  IfYn  converges  in  distribution  to  Y  as  n  —*  oo  and  Yn,  n  =  1,  2, . . .,  is  a  uniformly 
integrable  sequence,  then  E[Fn]  — »  EfF]  as  n  —>  oo. 

Lemma  3.4.7.  For  any  positive  integer  r,  (n1/2  Dn)r  is  uniformly  integrable. 

Proof :  This  result  follows  from  lemma  3.4.3  and  definition  3.4.5  after  applying  integration  by 
parts.  | 


Lemma  3.4.8.  Suppose  F  is  a  cdf.  Suppose  T  is  defined  on  7  C  D  where  7  contains  F  and 
all  empirical  cdf’s.  Suppose  T  is  k  times  Freehet  differentiable  at  F  with  respect  to  the  sup  norm 
||  •  |  j oo  *  Recall  Tk<n  =  Tk(F]Fn  —  F)  is  the  A;th  differential  of  T(F)  evaluated  at  Fn  —  F.  If  r  is  a 
positive  integer,  then  [nk/2Tktn)T ,  n=  1,2,...,  is  uniformly  integrable. 

Proof:  From  lemma  3.4.4  we  know  that  there  exists  A  >  0  such  that  |  TkjTl  |<  ADk.  From  lemma 
3.4.7  we  know  that  (nk/2Dk)r  is  uniformly  integrable  and  the  contention  follows.  | 
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Lemma  3.4.9.  Suppose  r  is  a  positive  integer,  and  XTn  and  YTn  are  uniformly  integrable.  Then 
so  is  {Xn  +  Yny. 

Proof :  Since  |  Xn  +Yn  \T<  2r(max(|  X„  |,  j  Yn  |))r  it  follows  that 

E[|  X„  +  Yn  r  /{|  x„  +  Yn\r>  A}}  <E[2r  |  Xn  |'  /{|  Xn  |'>  .42-'}] 

+  E[2r  |  Yn  |'  J{|r„  \r>A2~r}\. 

Under  the  hypothesis,  for  any  6  >  0  there  exists  A  such  that  the  right  hand  side  of  the  above  is  less 
than  S  for  n  sufficiently  large.  | 


Proof  of  lemma  3-4-2:  Since  the  remainder  Rkt„  =  o(j|  Fn  —  F  j^)  and  |j  ||oo==  GP(n~1^2) 

it  follows  that  nk/2Rk,n  goes  in  probability  to  0.  By  lemma  3.4.6,  the  only  thing  that  remains  to  be 
shown  is  that  (nfc/2  |  Rk,n  |)r  is  uniformly  integrable.  We  let  0  <  7  <  1/2,  c  >  0,  and  let  r  be  a 
positive  integer.  We  will  use  the  decomposition 


(n*/2  |  Rkin  I)'  =  (n*/2  |  Rk,n  | )rI{Dn  <  en-7}  +  (n*/2  |  Rk,n  \yi{Dn  >  cn~ 7}.  (3.4.3) 

The  first  term  of  (3.4.3)  will  be  considered  first.  Choose  6  >  0.  From  the  definition  of  Frechet 
differentiability  we  may  choose  e4  so  that  for  all  G  with  ||  G  Hoo^  tf,F  +  G£  7  and  |  Rk{F\  G)  |< 
||  G  II*,  6.  Let  N  >  {cfesYh-  If  n  >  N  then  cn“7  <  es  and 

nfc/2  |  Rk,n  |  I{Dn  <  cn~7}  <  6nk'2Dk. 


From  lemma  3.4.7  it  follows  that  (nfc/2  |  Rk<n  \)TI{Dn  <  cn  7}  is  uniformly  integrable. 

Now  we  consider  the  second  term  of  (3.4.3).  Since  Rk>rl  =  T{Fn)  —  T(F)—  £)y=i  T,',n/j!  it 
is  sufficient  (from  lemma  3.4.9)  to  show  the  uniform  integrability  of  (nfc/2  |  T[Fn)  |)r  I{Dn  >  cn-7}, 
(n*/2  |  T{F)  \Y  I{Dn  >  cn“7}  and  (n*'2  |  Tj,n  |)'  I{Dn  >  cn"7},  j  =  1,.,.,*. 


By  lemma  3.4.3  (n*/2  |  T(F)  |)r  I{Dn  >  cn  7}  is  uniformly  integrable. 

From  lemma  3.4.4  we  know  that  |  |<  AD}n  for  some  A  >  0.  Applying  lemma  3.4.3 

and  integration  by  parts  we  see  that 

E[(nfc/2  |  Tj>n  | )rI{Dn  >  cn"7}]  <  ATnr^-^2E[{n^2DnyrI{nlf2Dn  >  cn1/2"7}] 

<  Arnr(*-J')/2^ci  exp  (-2 cV-^y'V'*1/2"7) 


+ 


f  jrx**  1cie  2x  dxY 
Jcni/2~'j  J 
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The  first  term  inside  the  brackets  times  the  factor  outside  the  brackets  goes  to  zero  as  n  -+  oo. 
Letting  y  =  2x2  we  can  find  positive  constants  c2  and  c 3  such  that  the  second  term  inside  the 
brackets  times  the  factor  outside  is  less  than  or  equal  to 

C2„*-(*-i)/2  f  y(ir-2)/2e-„dy 

J  csnl~2“t 

If  n  is  sufficiently  large,  a  is  a  positive  integer  greater  than  (jr  —  2)/2,  and  e  =  1  —  27  >  0,  then 
this  is  less  than 

c2nr(*-^/2a!  exp(— c3nc)  ^(ej»£)*/y! 

3—0 

which  goes  to  zero  as  n  -»  00.  Thus  E[(n*/2  |  Tjt„  |)r/{Dn  >  cn-7}]  goes  to  0  as  »  -*  oo,  j  = 
1,2,...,*. 

Finally  we  consider  ( nk /2  |  T(Fn)  |)r  I{Dn  >  cn-7}.  We  let  e  and  S  be  as  in  assumption 
in  of  theorem  3.4.1.  Let  an  =  |[»ej  and  bn  =  Jn(l  —  e)J  +  1.  Let  ft  G  (0, 1/4)  and  consider 

(nfc/2(|  X6„:n  |  +S)Jmi{Dn  >  cn  *} 

=  (nfc/2(l  *»„=«  |  +5)),m/{D„  >  cn-7}/{|  Xfc„:n  |>  F-\  1  -  £  +  cn-*)} 

+  (nfc/2(|  X6„:n  |  +6)Jmi{Dn  >  cn  ,}/{|  X6„:n  |<  F^l  -  e  +  cn^)} 


The  first  term  is  uniformly  integrable  by  lemma  2.3.2.  The  second  term  is  bounded  by 
(nfc/2(F’_1(  1  —  e  +  cn~P)  +  ^))rmP{jDn  >  cn-7}.  It  goes  to  zero  by  lemma  3.4.3  since  7  <  1/2. 
By  lemma  3.4.9,  (n*/2(|  Xba.,n  |  +£))rm  I{Dn  >  cn  7}  is  uniformly  integrable.  Similarly 
(nfc/2(|  X0„:n  |  +«))'"*  I{Dn  >  cn  7}  is  uniformly  integrable.  By  lemma  3.4.9 
it  follows  that  (n*/2(|  X0ii:n  |  +  |  Xi }„.n  |  +2(5))  I{Dn  >  cn-7}  is  uniformly  integrable. 

By  assumption  in  it  follows  that  (nk/2T(Fn))r  I{Dn  >  cn-7}  is  a  uniformly  integrable  sequence.  1 


Finally  we  prove  our  main  theorem. 

Proof  of  theorem  3-4.1.  Since 

k 

T(Fn)  -  T(F)  =  £  Tjin/j\  +  Rk,n  (3.4.4) 

y= 1 


(3.4.2)  follows  from  Minkowski’s  inequality  and  lemma  3.4.2. 
Using  (3.4.4)  and  expanding  we  have 

m(Fn)-T(F)y]  =  j2(% 

«'= 0  '  ' 
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For  1  <  *  <  r  it  follows  from  Holder’s  inequality  that 

<  (E[|  Rk, 


E 


/  k  \T~l  r  r  fc  rlA(r  ‘)/r 

Wiy  < (E[i Rk, „ n)‘/r^ g Tj.n/ji  JJ 


From  lemma  3.4.8  it  follows  that  E[|  2y>n  |r]1/,T  =  0(n  J/2)  for  j  =  1,2,..., k.  From  this  and 
lemma  3.4.2  it  follows  that 


t,«  \T})i/r(v 


E  Ti.n/J'! 

i=i 


(E[|  Rk 

and  since  ik  +  r  —  i  >  r  +  k  —  1  (3.4.1)  follows, 


9 


(r-,)/r 


=  o(»— (,fc+,~‘)/2) 


§3.5  Calculating  moment  approximations. 

In  the  previous  section  we  have  proven  convergence  of  some  moment  approximations.  We 
will  now  present  some  results  on  functional  differentiation  and  von  Miscs  expansions  which  are  useful 
in  applying  this  theory  to  actually  calculate  moment  approximations.  We  also  give  formulas  for  bias, 
and  first  and  second  order  mean  squared  error  approximations  in  terms  of  the  first  three  functional 
derivatives  of  a  statistic.  These  formulas  will  be  applied  for  L-  and  M-cstimates  in  the  next  chapter. 

We  assume  throughout  the  following  that  Xi,X2, ...  is  a  sequence  of  iid  random  variables 
with  cdf  F  and  that  T  is  a  functional  defined  in  a  neighborhood  of  F  and  for  all  empirical  cdf’s 
which  is  k  times  Frechet  differentiable  at  F  with  respect  to  II  •  Hoc.  It  is  a  simple  consequence  of 
definition  3.3.1  that 


Tk(F-,  G,  +  G\,G2, . . .,  Gfc)  =  Tk{F ;  G„G2, .  • Gk)  +  Tk{F;  G\,G2, . . .,  Gk)  (3.5.1) 


and 


Tk{F ;  aGi,  G2,  •  •  •,  Gk)  =  aTk(F;  Gj,  G2, . . .,  G*). 


(3.5.2) 


Lemma  3.5.1  is  used  in  the  proof  of  lemma  3.5.2  which  shows  that  the  kernels  of  these 
Frechet  differentials  are  bounded  when  differentiation  is  done  with  respect  to  the  sup  norm. 
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Lemma  3.5.1.  Suppose  k  is  a  positive  integer  and  T  is  a  functional  defined  on  7  £  D  which  is 
k  times  Frechet  differentiable  at  F  with  respect  to  a  norm  ||  •  ||  on  D.  Then  for  j  —  1,2,  ...,1b, 
Tj(F;Gi...,Gj)  goes  to  zero  uniformly  as  max^Kj  ||  Gj  ||  goes  to  zero. 

Proof  :  Fix  j  and  to  such  that  1  <  to  <  j  <  k.  Assume  that  if  {Gj, . . .,  G*}  contains  exactly  to—  1 
distinct  elements  then  Tj(F;Gi, . .  .,Gj)  goes  to  zero  uniformly  as  maxi<j<^  ||  Gj  ||  goes  to  zero. 
Now  assume  that  {Gi,...,Gfc}  contains  exactly  to  distinct  elements  and  that  Gi  7^  G2.  By  (3.5.1) 
and  the  symmetry  of  Tj(F;  •)  it  follows  that 


Tj{F;  Gi, . . .,  Gj)  —  (1/2) 


^Tj(F;G!  +  G2)Gi  +  G2,  G3, . .  .,Gj)  —  TfiF;  Gi,  G<,  G3, . . .,  Gj).^ 


Since  Tj(F ;  •)  on  the  right  hand  side  of  this  equation  has  to  —  1  distinct  arguments  in  each  case,  it 
follows  that  they  all  go  to  zero  uniformly  as  maxi  <,<.,•  ||  Gj  ||  goes  to  zero.  The  contention  follows 
for  to  =  1  and  arbitrary  j,  1  <  j  <  k,  by  lemma  3.4.4.  For  arbitrary  j,  1  <  j  <  k,  the  result 
follows  for  1  <  to  <  j  by  induction  on  to.  | 


Lemma  3.5.2.  Let  k  be  a  positive  integer.  Suppose  T  is  k  times  Frechet  differentiable  at  F  with 
respect  to  ||  •  Hoo.  Then  the  kernel  hk  of  Tk  is  bounded. 


Proof  :  The  proof  uses  contraposition.  Suppose  hk[F ;  •)  is  not  bounded.  Then  there  exist  xim, 
1  ^  i  ^  kf  such  that  hk{Fj  iim, . . .,  Xkm )  ^  to^"^*,  to  —  1,2,....  Let  Gjm  =  6Zim  f  m.  This  implies 
that  Tk{F-,Gim,  ...,Gfcm)  >  to  and  that  maxi<j<*  ||  Gym  ||oo=  1/mgoes  to  zero  as  to  — >  00.  This 
contradicts  lemma  3.5.1.  | 

We  let 

Zi{x)  =  SXi{x)  -  F(x).  (3.5.3) 

Since 

n 

Fn-F  =  n~lY,  Zi  (3.5.4) 

t— 1 

it  follows  from  (3.5.1)  and  (3.5.2)  that 

n  n 

Tk,n  =  Tk(F ;  Fn  -  F)  =  n~*  £  •  • ■  £  T^F>  >  •  •  •»  Z<>)-  (3-5.5) 

*1=1  *1=1 

Since  hk  is  bounded  we  know  that  Tk(F ;  Z{1, .. .,  Z{k)  is  a  bounded  random  variable.  If  T  has  a  k *** 
differential  and  r  is  a  positive  integer  then  by  using  (3.5.5)  and  expanding  we  find 

E((£  T^/m  =  £E[f[  . z,m ,)/j(i)l], 


(3.5.6) 


so 
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where  the  sum  is  appropriately  defined  (explicit  examples  of  (3.5.6)  will  be  derived  more  carefully 
in  the  proof  of  theorem  3.5.3).  Symmetry  arguments  tell  us  that  we  need  only  compute  a  ‘few’  of 
the  terms  of  the  right  hand  side  of  (3.5.6)  to  get  E[(5^*=1  TJ>n)T].  The  following  lemma  shows  that 
even  most  of  these  terms  are  zero. 

Lemma  3.5.3.  Suppose  at  least  one  of  the  indices  iim  occurs  exactly  once  in 

r 

II  Tm(F;Zill,...,Zili(l))/m. 

i=i 

Then 

T 

E{J[Tm(F-,Zin,...,Ziim)/jm=Q- 

i=i 


Proof:  Without  loss  of  generality  we  assume  iltl  =  1,  and  if  ( l,m )  7^  (1,1)  then  t;m  7^  iitt.  Let 
j  —  j(l).  For  a  given  i  we  condition  on  Z,-|m  =  for  ( l,m )  7^  (1, 1).  It  will  suffice  to  show 


E [Tj(F;  Zi,  Z12, . . zij)\  =  0. 

By  definition 

F\Tj{F\  Z^z^, . .  .,zxj))  =£[/*•••  f  hj{F-,xl,...,Xj)dZl{xl)  JJ  dzu{xt)}. 

J  J  1=2 

Since  hj  is  bounded  by  lemma  3.5.2  and  all  of  the  measures  involved  are  bounded  we  may  switch 
the  order  of  integration  to  obtain 

F\Tj{F-,Zl,zilt,...,zil.)\  =  /••■  /  E[  f  hj(F ;  xi, . . .,  Xj)dZi(xi)\  JJ  dzu(xt). 

J  J  J  1= 2 

But 

E(/  hj(F}Xi,...,Xj)dZi(xi)]  =  E[hj(F-,X1,x2,...,Xj)]  -  J  hj(F;xi,...,xj)dF(x1)  =  0 

and  the  contention  follows.  | 


Using  (3.5.6)  and  lemma  3.5.3  we  can  show  by  counting  the  number  of  terms  of  various 
types  that  for  k  >  1 

k 

E {(T(Fn)  -  T(F)Y ]  =  E[(  Tj>n/j\y ]  +  0(n-^r+k+1^)  (3.5.7) 

if  T  is  k  +  1  times  Frechet  differentiable.  This  was  suggested  after  the  statement  of  theorem  3.4.1 
and  will  bo  shown  for  some  particular  cases  in  theorem  3.5.4. 

We  are  now  prepared  to  give  formulas  for  first  and  second  order  mean  and  mean  squared 
error  approximations. 


3.5.  Calculating  moment  approximations. 


Si 


Theorem  3.5.4.  Suppose  Xi,  X2,  ■  ■  -  are  iid  with  cdf  F  and  that  T  is  a  functional  defined  in  a 
neighborhood  of  F  and  for  all  empirical  cdf’s.  For  the  remainder  of  this  proposition  if  we  say  T  is 
k  times  Frechet  differentiable,  we  mean  at  F  with  respect  to  ||  •  Hoc,;  also  implicit  in  this  statement 
will  be  that  the  approximation 

E[(T(Fn)  -  T(F)Y]  =  E[(]T  Tjin/j\Y\  +  o(n-^+*-1)/2) 

i=l 

is  valid.  Define  Z{[ x)  =  $x\(aO  —  as  before.  If  T  is  (once)  Frechet  differentiable  then 

E[T(Fn)  -  T(F)]  =  o(  1/y/Z)  (3.5.8) 

and 

E[(T(Fn)  -  T(F))2]  =  -E[(T1(/!’;  Z,))2)  +  o(l/n).  (3.5.9) 

n 

If  T  is  two  times  Frechet  differentiable  then 

E[T(Fn)  -  T(F)\  =  ^E[ Tt(F ;  Zlt  Zn)\  +  o(l/n).  (3.5.10) 

If  T  is  three  times  Frechet  differentiable  then 

E[(T(Fn)  -  T(F))2]  =-E[(ri(J',;2i))2]  +  \e\UF-,  ZJT^F;  Zu  ZJ] 
n  n 

+  ((T,(Jf;  ZU  Z2))2)  +  ^(E[T2(F;  Zlt  Z,)})2  (3.5.II) 

+  IrE[Tt{F;  ZjTsiF-,  ZltZ2,  Z2)\  +  o(l/n2). 
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Proof :  Assume  T  is  one  time  Frechet  differentiable.  By  assumption 

E  [T(Fn)  -  T(F)]  =  E  [T^F;  Fn  -  F)}  +  o(  1/ VS). 

From  (3.5.1) — (3.5.5)  we  have 

n 

E[Ti(F;  Fn  -  F)}  =  E [n"1  £  Ti{F;  Zf)\. 

«= 1 

Since  E[Ti(F;  Z{)\  exists  and  is  finite  we  may  interchange  the  order  of  integration  and  summation. 
From  lemma  3.5.3  we  have  E[Ti(F’;  Zf)]  =  0  and  (3.5.8)  follows. 

Similar  arguments  justify  the  following  calculation: 

E[(T(Fn)  -  T(F))2}  =  E[(Ti(F;Fn  -  F))2}  +  o(l/n) 

=  «-2EE  eiw  Z^(F> 

<5=1  j=  1 
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By  symmetry  we  have  E[(2i(F;  Z,-))2]  ==  E[(Ti(F;  Zi)2\.  By  lemma  3.5.3  we  have 
E[2i(F;  Zi)Ti(F-,  Zj)]  —  0  for  j  7^  i,  1  <  t  <  n.  Thus  we  have  shown  (3.5.9). 

To  obtain  (3.5.10)  we  use  a  second  order  approximation: 

E [T(Fn)  -  T(F)]  =  E[Ti(F;  Fn-F)  +  (1  /2 )T2(F;  Fn  -  F)}  +  o(l/n) 

=  (2n)-1E[T2(^;  Zu  ZJ]  +  o(l/n). 


To  obtain  (3.5.11)  we  use  a  third  order  approximation: 

E[(T(Fn)  -  T(F)f]  =  E[(Ti(F;F„  -  F)  +  (1/2  )T2(F;Fn  -  F)  +  (1/6  )T3(F;Fn  -  F))2]  +  o(l/n2) 
=  E^F;  F„  -  F))2]  +  E[Ti(F;Fn  -  F)T2(F;  Fn  -  F)] 

+  (1/3)E[7\(F;  F„  -  F)T3(F;  Fn  -  F)]  +  (1/4)E[(T2(F;  Fn  -  F))2]  +  o(l/n2). 


The  terms  which  ‘disappear’  in  the  last  line  are  o(l/n2)  from  lemma  3.4.8  and  Holder’s  inequality. 
We  have  already  derived  E[Ti(F;F„  —  F))2]  above.  We  continue  with  the  other  terms. 

n  n  n 

E[?\(F;  Fn  -  F)T2(F;  Fn  -  F)]  =  n”3  £  £  £  E^F;  2T,)T2(F;  Zk)\ 

t— 1  fc=i 

=  n-2E[T1(F;2T1)r2(F;^,^)]. 


E[Tx(F;  Fn  -  F)T3(F;  Fn  -  F)]  =  n~4  £  £  £  £  E[Ti(F;  F,)T3(F;  *,)] 

t  =  l  J  =  1  Jt  =  l  i  — 1 

=  n-4(”)  •  2  • 3  •  E[Tx(F;  ^)T2(F;  Zlt  Z2)  Z2)]  +  o(l/n2). 

E[(T2(F;Fn-F))2]  =  n~4(n(n  -  1)(E[T,(F; -Zu  Zx)])2  +  Q  ■  4  •  E[(Ta(F; ^i,^))2])- 
Equation  (3.5.11)  now  follows  from  these  calculations.  1 


Chapter  4 

L-  and  M- estimates 


§4.1  Introduction. 

In  this  chapter  we  will  apply  the  theory  of  the  previous  chapter  to  derive  expansions  for 
moments  of  many  L-  and  M-estimates.  In  particular,  we  will  give  first  and  second  order  approxima¬ 
tions  for  the  mean  and  mean  squared  error  in  each  case.  Applications  of  these  approximations  will 
be  studied  in  the  next  chapter. 

We  now  note  some  of  the  limitations  of  the  theory  of  chapter  3  in  applications  to  L-  and 
M-estimates.  Most  of  the  limitations  should  not  be  of  great  concern  to  those  wishing  to  consider 
robust  statistics. 

The  first  limitation  is  that  the  influence  function  (kernel  of  the  first  differential)  must  be 
bounded.  This  is  not  an  important  limitation  for  robustness  since  it  implies  that  changing  a  small 
proportion  of  the  observations,  in  general,  changes  the  value  of  the  statistic  by  at  most  a  limited 
amount.  For  L-estimates  this  limitation  means  we  must  exclude  a  positive  proportion  of  the  extreme 
observations  from  the  calculation  of  the  estimate. 

For  M-estimates  we  have  the  requirement  that  the  influence  function  be  non- decreasing 
as  well  as  bounded.  We  need  this  to  give  the  quantile  bound  required  by  theorem  3.4.1.  This 
eliminates  M-cstimates  with  redcscending  influence  curves,  which  is  an  unfortunate  limitation. 
Another  limitation  of  the  present  treatment  is  that  we  have  not  considered  simultaneous  estimation 
of  location  and  scale  for  M-estimates.  Eynon  (1982)  uses  some  of  the  results  given  here  to  treat  this 
case. 

Finally,  the  condition  of  Frechct  differentiability  imposes  ‘strong  smoothness’  conditions 
on  the  estimators  to  be  considered.  For  instance,  quantiles  are  not  Frechet  differentiable,  and  the 
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S£ 

M-estimate  referred  to  as  Huber’s  proposal  number  one  is  only  once  Frechet  differentiable.  Examples 
of  functionals  which  are  not  Frechet  differentiable  and  efforts  to  patch  up  our  theory  in  these  cases 
are  discussed  in  section  5.4.  We  note  difficulties  in  showing  that  the  standard  definition  of  Frechet 
differentiability  holds  in  cases  where  we  show  our  definition  of  Frechet  differentiability  holds  after 
the  proofs  of  proposition  4.2.5  and  lemma  4.3.5. 


§4.2  Theory  for  L-estimates. 

We  will  now  apply  the  theory  of  chapter  3  to  develop  a  theory  of  moment  convergence  for 
L-estimates.  The  results  on  moment  approximation  contained  in  this  section  are  summarized  by 
(4.2.1),  lemma  4.2.4  and  proposition  4.2.6.  The  L-estimates  we  will  consider  are  defined  for  any  cdf 
F  by 

J{u)F~l{u)du  (4.2.1) 

when  the  integral  is  well  defined  and  J  is  a  real  valued  function.  We  do  not  use  the  more  general 
version 

rl  m 

T{F)=  /  J(u)F~l(u)du+  T]aiXlcinl+Un  '  (4.2.2) 

Jo  i=i 

since  then  T  would  not  be  Frechet  differentiable;  this  is  discussed  in  section  5.4.  Recall  that  a  result 
on  the  moments  of  a<X|[einj+i:n  was  given  in  proposition  2.3.11. 

Note  that  the  functional  of  (4.2.1)  is  not  defined  on  all  of  D  since  not  all  elements  of  V  have 
an  inverse.  Thus  before  showing  the  Frechet  differentiability  of  the  functional  defining  an  L-estimate, 
we  must  extend  the  definition  to  D.  The  concept  of  bounded  variation  and  some  simple  properties 
related  to  it  will  be  needed  for  this  definition  and  for  the  proof  of  Frechet  differentiability  for  both 
L-  and  M-estimates. 


Definition  4.2.1.  A  function  g  is  said  to  have  bounded  variation  on  [a,  6]  if 
Vab  =  SUP 1 53  I  s(x*)  ~  I  :n  <  oo  and  a  <  x0  <  ■  ■  •  <  x 

If 

||  9  ||tv=  lim  Vab  <  OO) 


»  <&j 


<  oo. 


we  say  that  g  has  bounded  total  variation. 


4.2.  Theory  for  L-estimates. 


SS 


If  |!  g  ||rv<  oo  then  g  is  bounded  and  may  be  written  as  the  difference  of  two  bounded, 
monotone  non-decreasing  functions.  Thus  for  any  function  of  bounded  variation  there  is  a  cor¬ 
responding  finite  signed  measure.  When  we  write  an  integral  with  a  differential  element  dg  where  g 
is  a  function  of  bounded  variation  this  will  mean  integration  is  to  be  done  with  respect  to  the  finite 
signed  measure  corresponding  to  g.  If  g  is  continuous  and  has  bounded  variation  on  [a,  6]  then  it  is 
also  uniformly  continuous  there.  The  following  lemma  is  a  special  case  of  a  result  given  by  Rudin 
(1964),  p.  122. 

Lemma  4.2.2.  Suppose  f  and  g  are  real-valued  functions  of  bounded  variation  on  [a,  6]  and  f  is 
also  continuous.  Then 

[  fdg  — 

J  a 


/mb) 


f(a)g{a)  -  f 

J  a 


gdf. 


Definition  4.2.3.  Let  J  be  a  function  of  bounded  variation  on  [0, 1]  which  is  0  outside  of  [0, 1]. 
For  any  G  G  D  where  the  integral  is  well  defined  we  let 

/oo  ,G(x) 

xd  /  J{u)du. 

-OO  Jo 

If  Fn  is  an  empirical  cdf  generated  by  a  set  of  n  iid  random  variables,  then  T(Fn)  is  said  to  be  an 
L-estimate. 

We  will  be  interested  in  the  case  in  which  J  is  zero  outside  of  some  interval  [5i,l  —  #2] 
where  0  <  Si  <  1  —  62  <  1.  In  this  case  we  now  show  that  if  F  is  a  cdf  then  T(F)  of  definition 
4.2.3  has  the  same  value  as  T[F)  of  J[u)F~1{u)du. 


Lemma  4.2.4.  Suppose  F  is  a  cdf  and  J  is  a  function  of  bounded  variation  on  [0, 1]  which  is  zero 
outside  of  some  interval  (£1, 1  —  62]  where  0  <  6\  <  1  —  8%  <  1.  Then 


J(u)F  x(u)du  = 


Proof :  We  begin  by  rewriting  the  single  integral  of  the  lemma  as  a  sum  of  two  double  integrals, 
yl  *F(  0)  /.o  ,1 

I  J(u)F~1(v)du  =  —  I  J(u)  I  dxdu  +  /  J(u)  I  dxdu.  (4.2.3) 

Jo  Jo  Jf-'(u)  Jf(  0)  Jo 

We  will  not  show  here  that  J{u)du  is  a  function  of  bounded  variation;  the  argument  is  similar 

to  one  given  below  in  the  proof  of  proposition  4.2.5.  Let  a  be  such  that  F(a)  <  81  and  |  a  |<  00.  If 
o  <  0  the  first  of  these  terms  may  be  rewritten  using  first  Fubini’s  theorem  and  then  lemma  4.2.2 
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as 


rH o)  r»  fu  rrw 

—  /  /{«)  /  dxdu  —  —  I  I  J(u)dudx 

J  F(a)  JF~l(u)  Ja  J  F(a) 

,F(x)  1°  ,0  /  ,F(*)  \ 

=  —x  I  J(u)du\  +  /  a:cf|  /  J(u)du  J 
JF(a)  j  ^F(a)  ^ 

,0  f  fF(x)  \ 

-  L  z\i  JMJu} 

The  result  follows  by  using  (4.2.3),  this  expression,  and  the  analogous  expression  for  the  second  term 
of  (4.2.3).  | 

For  the  following  proposition  recall  the  definition  of  a  kernel  (definition  3.3.1)  and  our 
definition  of  Frechet  differentiability  (definition  3.3.2)  which  has  weaker  conditions  that  the  standard 
definition.  Our  proof  is  an  extension  of  the  analysis  of  Boos  (1979);  his  proof  also  appears  in  Serfling 
(1980),  pp.  281-282.  Boos’  proof  is  for  the  first  differential.  Note  that  when  we  say  J  is  continuous 
a.e.  with  respect  to  F~l  for  some  cdf  F,  we  mean  with  respect  to  the  measure  corresponding  to  the 
monotone  function  F~l. 


,0  rF(x) 


Proposition  4.2.5.  Let  J  be  defined  on  [0, 1]  with  J(u)  =  0  for  u  <  Si  and  u  >  1  —  S2  where 
0  <  <  1  —  #2<1-  Let  7  be  the  elements  of  D  for  which 

G(x) 


/oo  /  fG(x)  V 

xd\J^  J{u)du  j 


is  well  defined.  Suppose  F  G  7  is  a  cdf.  Let  k  be  a  positive  integer.  Let  denote  the  jth  derivative 
of  J  where  it  exists,  j  =  1, 2, . . .,  k.  Let  /(°'  =  J .  Assume  that  J^\  j  —  0, 1, . . .,  k  —  2,  are  bounded 
and  absolutely  continuous.  Assume  that  is  bounded  and  continuous  a.e.  with  respect  to  F~l. 

Then  T  is  k  times  Frechet  differentiable  at  F  with  respect  to  ||  ■  Hoc.  For  j  —  1  ,...,k  the  kernel  of 
the  Frechet  differential  of  T  at  F  with  respect  to  [|  •  |joo  is 


roo 

hj{F ;  Xj)  =-}  J(i-l\F[y))dy. 

J  max  x< 


(4.2.4) 


We  delay  the  proof  of  this  proposition.  For  the  remainder  of  the  section  when  limits  of 
integration  are  not  given  explicitly,  they  will  always  be  ±oo. 

Proposition  4.2.6.  Let  F  be  a  cdf  such  that 

x — ►oo  log  X  x— ►oo  log  X 


4.2.  Theory  for  L-estimates. 
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Let  J(u)  be  defined  on  [0, 1]  with  J(u)  —  0  for  u  <  Si  and  u  >  1  —  S2  where  0  <  Si  <  1  —  S2  <  1. 
Let  7  be  the  elements  of  D  for  which 


/oo  /  *G(x)  \ 

xdl  /(«)du  J 


is  well  defined.  Let  k  be  a  positive  integer.  Assume  that  J^\  j  —  0, 2, . . .,  k  —  2,  are  bounded  and 
absolutely  continuous.  Assume  that  J^k~^  is  bounded  and  continuous  a.e.  with  respect  to  F~l.  Let 
Tjtn  denote  the  jth  Frechet  differential  Tj(F;Fn  —  F)  of  T  at  F,  j  =  1,2 Then  for  any 
positive  integer  r 


E[(T(Fn)  -  2W]  =  E  (E 


+  o(n-(r+*-1)/2). 


If  k  >  1  then 


and 


E[T(Fn)-T(F)]  =  o(l/y/n) 


(4.2.5) 


(4.2.6) 


(4.2.8) 


E[(T{Fn)  -  T(F))2]  =  J(f{  mint*!,  x2))  -  F(Xl)F(x2)j  J[  J(F(Xi))dXi  +  o(l/n).  (4.2.7) 

If  k  >  2  then 

E[T{Fn)  -  T{F)]  =  ^J  F(X)(  1  -  F(x))jW{F(x))dx  +  o(l/n). 

Finally,  if  k  >  3  then 

E[(T{Fn)  -  T(F ))2]  =1  /  j (f( min(Xl,X2))  -  F(Xl)F{x2)j  JJ  J{F(Xi))dXi 

+  J  J ^Jf,(min(a:x,X2))  -  F(si)F(x2)j(l  -  2 F(x2))) 

J{F{Xl))jll\F(x2))dXldx  2 

+  2n2  /  J(^F{min{Xl}X2))-F[Xl)F[X2)^  f[j^\F(Xi))dxi 

+  4^2  (/  *M1  "  F(x))/(1>(F(x))d^ 

+  n?  f  J ~  F{xl)F(x2)>jF(x2)(l  -  F{x 2)) 

J(F(xi))j(2\F(x2))dXidx2 

+  o(l/n2). 


(4.2.9) 


The  remainder  of  this  section  consists  of  the  proofs  of  propositions  4.2.5  and  4.2.6. 

We  will  need  the  following  version  of  Taylor’s  theorem  which  can  be  found  in  Hardy  (1952) 
to  prove  Frechet  differentiability  for  both  L-  and  M-estimates. 
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Lemma  4.2.7.  If  f  has  a  finite  derivative  at  x  then 


3—0 


Proof  of  proposition  4-2-S-  To  obtain  differentials  we  recall  (4.2.4)  and  definitions  3.3.1  and  3.3.2.  If 
G{  (z  D,  i  —  1,2 1  <  j  <  k,  then  the  following  interchange  of  integration  is  justified  under 
the  assumptions  of  the  proposition: 


Tj{F,Gil...,Gi)  =/”•/(-/  I[  dGi(xi) 

J  J  \  */ma xxi  J  . ^ 

=/•••/(-  /(n  n  dG<(x<) 

=  -/(n  /  ^<(y)t/G«(x«)j'/(i-1)(7J’(y))^ 

=  -/  (n 


(4.2.10) 


Let  0  <  e  <  min(^i,  62).  There  exist  a  and  b  such  that 


— 00  <  a  <  F  (#i  —  e)  <  F  (1  —  £2  +  e)  <  b  < 


If  II  G  ||oo<  e  then  for  x  <  a  and  x  >  b  we  have 


fF{?)+G(z) 

/  /  J(u)du  =  0. 

Jf(z) 


For  the  remainder  of  the  proof  we  assume  that  [|  G  ||oo<  £•  We  now  show  that  fp^+G^  J(u)du 
has  bounded  total  variation.  For  n  <  00,  a  <.  xq  xn  <  b 


i— 1 


l‘F(xi)+G{xi)  |•t•^Xi-1)+u{Xi-1) 

I  J(u)du  —  /  J(u)du 

Jf (*<) 

I  J{u)du  —  /  J(u)du 

J  F(xi-i)+G(xi^1)  J  F(x  »_i. ) 


-E 

*=1 


<11  /  Hoc  E  (2  I  F(*0  -  F(*<-i)  I  +  I  G(x<)  -  I). 


»=i 


(4.2.11) 
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Since  F,  G  €  D  they  are  of  bounded  variation.  It  follows  from  (4.2.11)  that 
bounded  variation.  We  may  now  rewrite  T(F  +  G)  —  T(F)  as 

rF(*)+G(  x) 

T(F  +  G)  -  T(F) 


J(u)du  has 


f  f  /‘F(x)+G(x)  \  r  (  fF(*)  \ 

=  J  xdl  J ^  J(u)du  I  —  J  xdl  J  J(u)du  j 

(  fFW+GW  \ 

=  J-^d[jm  JMd“ ) 

,i  /  fF(x)+G(x)  \ 

”  J.  xd{h„  J(“)d“} 


(4.2.12) 


Applying  lemma  4.2.2  we  obtain 


nF(x)+G(x) 

J(u)dudx. 

'(x) 


(4.2.13) 


For  0  <  i  <  k  equations  (4.2.10)  and  (4.2.12)  imply  that  the  remainder  of  the  ith  order  Taylor 
approximation  is 

fb  (  fF ^  ‘  A  I 

|  Ri(F;G)  \=  [  J(u)du  +  V;  G*(x)jV-l\F(x))/j\  <fe 

Ja  \JF{x)+G(x)  frj  J  | 

rk  1/Jff+G(x)  JWdu  +  Ey=i  &(x)jU-V(F(x))/j\ 


<j 


G'{x) 


*■11  GIL 


where  the  integrand  of  the  last  line  is  defined  to  be  0  when  G(x)  —  0.  Thus  to  show  that  T  is  k 
times  Frechet  differentiable  at  F  it  suffices  to  show  that,  for  0  <  i  <  k,  as  ||  G  ||oo  goes  to  zero 

r‘  I  +  Ey=i  G*(x)jV~')(F(x))/j\  I 


f  WGti{x)dx  =  [ 
J  a  J  a 


G*(x) 


dx  —>  0. 


(4.2.14) 


Let  i  be  an  integer  with  0  <  i  <  k.  Let  A  =  {x  :  jh\u)  is  continuous  at  u  —  F(x), 
0  <  j  <  k}.  It  x  £  A  then  WG,i(x)  — >  0  as  ||  G  Uoo-*  0  by  the  version  of  Taylor’s  theorem  given 
in  lemma  4.2.7.  From  the  assumption  that  j(k~1)  is  continuous  a.e  with  respect  to  F~l  it  follows 
that  the  complement  of  A  has  Lebesgue  measure  zero.  Under  the  assumptions,  we  have  by  Taylor’s 
theorem  that 


rF(x) 

JF(x)+i 


F(x)+G(x) 


J{u)du  +  J2  GJ(x)/^-1)(F(x))/j! 


i=i 


<11  Hoc  G*(x)/,!. 


Thus  WG,i  can  be  bounded  by  2  ||  </(*  iu-  It  follows  that  from  the  dominated  convergence 
theorem  that  for  any  sequence  Gn  such  that  ||  Gn  ||oo  goes  to  zero  as  n  ->  oo  the  integral 
/  Wan,i(x)dx  — »  0.  By  a  standard  theorem  from  analysis  it  follows  that  /  WG,i{x)dx  ->  0  as 
||  G  !|oo->-  o.  I 
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Note  that  if  is  discontinuous  at  iti  E  (0, 1)  then  for  any  e  >  0  there  exists  a  cdf  H 

with  ||  H  —  F  ||oo <  e  and  Jf-1(u)  not  continuous  at  «i.  Thus  the  set  A  of  the  above  proof  is,  in 
this  case,  not  of  measure  zero  and  we  may  not  apply  the  dominated  convergence  theorem  as  above. 
This  means  that  we  cannot  extend  this  proof  directly  to  show  Frechet  differentiability  under  the 
standard  definition  when  is  not  continuous. 

Proof  of  proposition  4- 2.6 .  Let  Xi,X2...  be  iid  observations  from  F.  Let  Fn  be  the  empirical  cdf 
of  the  first  n  observations  and  let  X,-in  denote  the  t1*1  order  statistic  of  the  first  n  observations, 
1  <  i  <  n.  We  may  write 

[(1— .5a)nJ  +  l 

T(Fn)  =  2  X,!n  /  J(u)du. 

This  implies 

I  T(Fn)  |<  (|  Xjjin|.n  |  +  |  X|(1_j2)n|+|.n  |)  ||  J  1 1 oo  • 

Thus  (4.2.5)  follows  from  proposition  4.2.4  and  theorem  3.4.1. 

We  now  wish  to  apply  theorem  3.5.3  to  show  the  remaining  results.  Equation  (4.2.6)  follows 
from  (3.5.8).  Given  the  restrictions  of  this  proposition  it  is  easy  to  justify  all  changes  of  order  of 
integration  in  the  following  calculations.  Derivation  of  the  first  formula  is  fairly  detailed;  steps  are 
left  out  of  subsequent  calculations.  In  each  case,  we  begin  by  applying  (4.2.10).  Recall  Z,-  —  Sxt  —F. 

E[(T1(FiZl))*)  =  /  ( J (Sy(x)  -  F(xj)J(F(x))dx]  dF(y) 

-///«■  „(*»)  -  n*i))(^(z2)  -  F(x2))dF(y)  J!  J(F(xi))dxi 

1  =  1 

=  “  Sv(xi)F(xz)  -  sy{.xz)F(xi)  +  Hxi)F(xi)j 

2 

dF(y)  JI  J(F(x{))dxi 

i— 1 

- //(  F'(min(a:i,i2))  —  F{xi)F{x2)j  JJ  J(F(xi))dxi. 

Equation  (4.2.7)  now  follows  under  the  given  conditions.  This  is  the  usual  first  order  variance 
approximation  for  L-estimates.  We  continue  with  the  bias  approximation. 

E[T2(F;  Zu  Zt)]  =  j  j (Sy{x)  -  F{x))2jW(F{x))dF(y)dx 
=  J  F{x)(  1  -  F(x))jW(F(x))dx. 


4.3.  Theory  for  M-estimates. 


a 


Equation  (4.2.8)  now  follows  under  the  given  conditions.  Finally  we  do  the  remaining  calculations 
needed  for  the  second  order  mean  squared  error  approximation. 

E [TxiF;  Zi)T2{F;  Zlt  Zv)\  =  JfJ (*»(*i)  -  F(xt))(Sy(x2)  -  F{x2))2 

JiFix^J^iFix^dxidxzdFiy) 

=  J J ^F(min(x1,x2))  -  ^(x^^x-!^^  -  2F(x2)) 

JiFix^jWiFix^dxidx*. 

EKwr,  ,z,)t\  =  j  j  J  J  n(n(sv,(*.)-n*.')y(l,(n^))*iVfe) 

-//(  F,(min(xi,x2))-  F(x1)F[x2)j  JJ  J(‘l\F(xi))dxi 


E [T^F;  ZMFi  Zu  Z2,  Z2)\  =  J  J (f{ *,))  -  F,(x1)F’(x2)V(x2)(l  -  F(x2)) 

J(F(xi))J^(F(x2))dxidx2. 


Equation  (4.2.9)  now  follows  under  the  given  conditions.  | 


§4.3  Theory  for  M-estimates. 

In  this  section  we  will  show  the  Frechet  differentiability  of  many  M-estimates  and  give 
the  theory  for  first  and  second  order  variance  approximations.  Boos  and  Serfiing  (1980)  have  given 
conditions  under  which  an  M-functional  is  one  time  Frechet  differentiable.  We  give  a  somewhat 
different  proof  for  conditions  under  which-  an  M-functional  is  one,  two,  or  three  times  Frechet 
differentiable. 

For  notational  convenience  we  shall  write  /  ipdF  instead  of  /_  ip(x)dF(x).  In  general 
when  the  argument  of  a  function  in  an  integrand  is  left  out  it  will  simply  be  the  running  variable  (in 
this  case  x)  and  when  the  limits  of  integration  are  not  given  they  will  be  ±oo.  Otherwise  arguments 
of  functions  and  limits  of  integration  will  be  explicitly  given. 

Definition  4.3.1.  Let  ip  be  a  real-valued  function  on  5f.  j4n  M-estimate  corresponding  to  ip  is 
T(Fn)  where  T(F)  is  any  functional  which  satisfies 

0  =  J  iP(x  -  T{F))dF{x) 
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jl _ 

on  its  domain  of  definition. 

We  now  give  a  heuristic  method  of  finding  differentiale  for  M-estimates.  The  usual  candidate 
for  the  differential  is 

TAF;G)=-^T{F  +  hG)\h=0-  (4.3.1) 

For  any  G  £  D  we  define 

XG(t)  =  J  'Pi1  ~  t)dG(x). 

Let  7  denote  the  domain  of  definition  of  T.  For  G  E  7  we  have  \g(T(G))  =  0.  Noting  that 

X F+hG(t )  =  \F{t)  +  h\c{t) 


we  have  thus 


0  =  \f{T{F  +  hG))  +  h\a(T(F  +  hG )). 

Differentiating  this  with  respect  to  h  we  obtain 

0  =  \'f(T(F  +  hG))4rT{F  +  hG)  +  XG(T(F  +  hG))  +  h\’G{T(F  +  hG))4rT{F  +  hG).  (4.3.2) 

an  an 

Setting  h  =  0,  this  and  (4.3.1)  yield 


Ti(F;  G)  =  — 


K{t(f)Y 


(4.3.3) 


Differentiating  both  sides  of  (4.3.2)  with  respect  to  h  again  we  obtain 

0  =\'lr{T{F  +  hG))^T{F  +  hG)j  +  \'f(T(F  +  hG))^T{F  +  hG) 

+  2\'a{T{F  +  hG))^-T{F  +  hG)  +  h\G(T(F  +  hG))^~T{F  +  hG^j 
+  h\'G(T(F  +  hG))~T(F  +  hG). 

Setting  h  —  0  in  this  equation  and  applying  (4.3.1)  and  (4.3.3)  we  obtain  a  candidate  for  the  second 
differential: 

X F(T(Fm(F ;  G))\  +  2\'G(T(F))Ti(F]  G) 

T^F>  G)  X'f(T(F)) 

n\a[T(F))\'G(T(F))  (\a(T(F)))*\mF)) 

(\'f(t(fw  (K(t(f)))3 


(4.3.4) 
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The  candidate  for  the  third  differential  is  obtained  in  the  same  fashion: 

Tz(F ;  G)  =  -(ymFMT^F;  G))3  +  3 X"  (T^))^;  G)T2{F;  G) 

+  3 \Unn)(Ti(F-,  G))2  +  3 \'g(T(F))T2{F-,  G^XUW))-1 
(\g(T(F)))3\>1>(T(F))  {\G(T{F)))2\'G(T{F))\'j,(T(F)) 

(\'f(T(F))Y  (Xi-(T(F))4 

_  oMT(F)))2\^(T(F))  p\g(T(F))(\'g(T(F)))2 

(\'AT(FW  (*f(AFW  (WW))*  • 

(4.3.5) 

We  will  give  an  example  of  how  to  obtain  kernels  from  (4.3.3) — (4.3.5).  We  consider  T2.  A 
bilinear  function  T2(F;Gi,G2)  satisfying  T2(F;G)  =  T2(F;G,G)  where  T2(F;G)  is  as  in  (4.3.4)  is 

TiF.n  n'-  ^(T(m'G2(T(F))  +  \g2(T(F))\'Gi(T(F))  XGl(T(F))XG2(r(F))X'/.(r(F’)) 

2t  ’  11  2)  Vf(AFW  (X'f(T(^)))3  • 

Switching  the  order  of  integration  and  differentiation,  and  replacing  Gi  and  G2  with  distribu¬ 
tions  degenerate  at  xi  and  x2  respectively,  we  obtain  a  candidate  for  the  second  kernel 
(assuming  T(F)  =  0) 


h2{F-,x  i,x2) 


ipjx^ip'jx^  -f  ip[x2)ip’(xi)  ip(xi)ip(x2)  f  ip"  dF 

UVdFf  U^'dFf 


We  will  now  give  sufficient  conditions  for  a  functional  corresponding  to  an  M-estimate  to 
be  one,  two  or  three  times  Frechet  differentiable  according  to  definition  3.3.2  (recall  this  has  slightly 
weaker  requirements  than  the  standard  definition).  No  additional  concepts  should  be  needed  to 
extend  the  proof  of  the  theorem  to  higher  order  differentials. 


Proposition  4.3.2.  Let  F  be  a  edf.  Let  ip  be  such  that  f  ipdF  =  0.  Let  k  be  1,  2,  or  3.  Assume 
that  ip  and  its  first  k  —  1  derivatives  are  absolutely  continuous  everywhere  and  have  bounded  total 
variation.  Assume  that  the  fcth  derivative  of  ip  exists  a.e.  with  respect  to  Lebesgue  measure  and  with 
respect  to  the  measure  corresponding  to  F,  and  has  bounded  total  variation.  This  implies  that  the 
integral  f  ip' dF  is  well  defined.  Assume  f  ip' dF  >  0.  For  G  G  D  with  ||  G  Hoc  sufficiently  small 
there  exists  a  functional  T(F  +  G)  such  that  0  =  f  ip(x  —  T(F  +  G))d(F(x)  +  G(x)).  Let  7  be  a 
neighborhood  of  F  in  D  where  such,  a  T  can  be  defined.  Then  T  is  k  times  Frechet  differentiable  at 
F  with  respect  to  ]|  •  and  (4-3.3) — (4-3.5)  hold.  The  kernels  of  the  first  three  differentials  of  T 

are 


hi(F  \x)  — 


Ax) 

f  iP'dF ’ 


(4.3.6) 
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.  /p  ■>  ip{xi)ip'{x2)+  ip'{x!)ip{x2)  fip"dF  . 

h2(F-,xux2)  = - (ffdFr -  (4-3>7) 


(/  ip'dF)3 


and 


f  ip'"dF 

hs{F ;  *1, i2,  *s)  =  -  'P{xi)il>{x2)rl>{x3) ^ldf,y 

-  3^(*i)^(za)^'(*s)  +  ip{xi)ip'{x2)ip(x3)  +  ip'{xi)ip(x2)ip(x3)j 

+  W(xiMx2Mx3)tt^j5 

ip(xi)ip(x2)ip"(  x3)  +  ip(xi)ip"{x2)ip(x3)  +  ip"[xi)ip(x2)ip(x3) 

tfVdFf 

^p(xi)ip'{x2)ip'{x3)  +  ip'{x1)ip(x2)ip'(x3)  +  ip'(xi)ip’(x2)ip(x3) 

(f  iP'dF)3 

(4.3.8) 


We  can  apply  this  proposition  to  obtain  a  result  on  moment  approximations  of  M-estimates 
with  non- decreasing  ip  functions.  We  restrict  ourselves  to  the  case  of  the  underlying  distribution 
being  symmetric.  This  is  not  necessary,  but  is  a  common  assumption  and  makes  the  expressions 
simpler. 


Proposition  4.3.3.  Let  F  be  a  cdf  which  is  symmetric  about  0  such  that 

n  .  —  log(l  —  Fix)) 

0  <  a  =  Inn  inf - - - . 

Z—+00  log  X 

Let  ip  be  a  bounded,  non- decreasing,  absolutely  continuous,  odd  function.  Assume  that  ip'  exists  and 

is  continuous  a.e.  with  respect  to  Lebesgue  measure  and  with  respect  to  the  measure  corresponding 

% 

to  F,  is  of  bounded  total  variation,  and  satisfies  /  ip'dF  >  0.  Let  7  €  D  be  the  set  where  a  solution 
T(G)  of  0  =  /  ip(x  —  T(G))dG(x)  exists.  Then  7  contains  all  G  €  D  which  are  non- decreasing  and 
a  neighborhood  of  F  with  respect  to  ||  •  ||oo.  If  G  is  non-negative  and  non- decreasing  this  solution 
minimizes  /  p(x  —  T(G))dG(x)  where  p(x)  =  ip{y)dy.  Letting  Fn  denote  the  empirical  cdf  of  n  iid 
observations  from  F  it  follows  that  T(F„)  minimizes  /  p(x  —  T(Fn))dFn  and 


E[r(Fn)]  =  0,  E[(T(Fn))2]  = 


.21  I  J^dF 


+  o(l/n). 


(4.3.9) 


n  (/  ip'dF)2 

Assume,  in  addition,  that  the  first  two  derivatives  of  ip  are  absolutely  continuous  and 
bounded.  Assume  also  that  the  third  derivative  of  ip  exists,  is  continuous  a.e.  with  respect  to  Lebesgue 
measure  and  with  respect  to  the  measure  corresponding  to  F,  and  has  bounded  total  variation.  Then 


E[(T(Fn))2] 


n  —  1  fip2dF 
n2  (/  ip>dFf  «2  (J  ip'dF)3  '  n2  (f  ip'dF)* 


2  j>VdF+  3  /  iP2dF  f(ip')2dF 


+ 


3  f  ip2 dF'  f  ipip"dF  1  (/  iP2dF)2  f  iP'"dF 
«2  (/  ip'dF)*  n2  (/  ip'dF)5 


+  o(l/n2). 


(4.3.10) 
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The  remainder  of  this  section  is  devoted  to  proving  the  above  two  propositions.  We  begin 
by  giving  a  series  of  three  lemmas.  i 

Lemma  4.3.4.  Suppose  if)  is  a  function  of  bounded  total  variation.  Then  there  exists  A  >  0  such 
that  for  all  G  €  D 

sup  |  /  ip(x  -  t)dG(x)  |<  A  ||  G  ||oo  . 

—cx><t<bo 

Proof:  Since  ip  has  bounded  total  variation  it  may  be  written  as  the  difference  of  two  bounded 
non- decreasing  functions  ip i  and  V’2-  Let  A  be  such  that  A/2  is  a  bound  for  these  two  functions. 
For  any  fixed  value  of  t 

|  /  tp(x  —  t)dG( x)  |  <  |  /  ipi[x  —  t)dG(x)  |  +  |  /  ipi{x  —  t)dG(x)  |  . 

.  # 

But  for  i  =  1,2 

|  /  ipi{x  -  t)dG[x)  |<  [A/2)  sup  |  J  dG(x)  \<  (A/2)  ||  G  H*, 

—oo<y<oo  —  oo 

and  the  contention  follows  since  t  was  arbitrary.  | 

Lemma  4.3.5.  Let  F  G  V .  Suppose  ip  is  absolutely  continuous  and  bounded,  and  that  ip'  exists 
a.e.  F  and  Lebesgue,  and  is  bounded  where  it  exists.  Then 

—  I  ip{ x  —  t)dF(x)  =  —  [  ip'{x)dF(x). 
dtt  J  ^ _ q  J 

If,  in  addition,  ip  is  differentiable  at  each  real  x  then  for  any  G  G  P ,  t  £  3? 

—  J  ip{x  —  t)dG{x)  =  —  J  ip'{x  —  t)dG(x). 

Proof :  By  the  definition  of  derivatives 

±  [  n*  -  =  um  /  idF[x). 

at  J  h — ►o  J  tt 

For  t  =  0  the  limit  of  the  integrand  of  the  right  hand  side  is  —ip'(x)  except  at  a  set  of  F  measure  zero. 
Since  the  integrand  is  bounded  by  sup_00<J|<00  |  ip'(y)  \  the  first  result  follows  by  the  dominated 
convergence  theorem. 

The  proof  is  the  same  for  the  second  claim  since  ip'[x  —  t)  exists  for  a.e.  x  £  G  for  any 
G  6  D  under  the  additional  assumption.  | 
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Our  proof  of  Frechet  differentiability  uses  this  lemma.  If  ip  is  not  continuously  differentiable 
then  for  any  e  >  0  there  exists  H  £  D  with  ||  H  -  F  ||oo  <  e  such  that  H  is  not  continuous  at  a  point 
of  discontinuity  of  ip'.  Thus  we  cannot  use  this  lemma  to  show  the  standard  definition  of  Frechet 
differentiability  holds  if  ip  is  not  continuously  differentiable. 

Lemma  4.3.6.  Let  F  be  a  cdf.  Suppose  ip  is  an  absolutely  continuous  function  with  bounded  total 
variation  whose  derivative  exists,  is  bounded,  and  is  continuous  a.e.  F.  Suppose 

X p(t )  =  J  ip{x  —  t)dF(x), 

XF(0)  ==  0,  and  X'F(0)  <  0.  Then  there  exists  a  functional  T(G )  in  a  neighborhood  of  F  in  D  such 
that 

\a(T(G))  =  /  iP(x-  T(G))dG{x)  =  0 

:  %■ 

t 

and 

|  T(G)  -  T(F)  |=  0(||  G  —  F  ||oo). 

Proof :  From  lemma  4.3.4  there  exists  A  >  0  such  that 

sup  |  XG(i)  -  \F{t)  j  =  sup  |  /  ip{x  -  t)(dG(x)  -  dF{x))  \ 

— oo<t<oo  — oo<t<oo 

<  AHG-Flloo. 

Let  6i  >  0,  0  <  62  <  —  XF(0)  be  such  that  if  t  £  (— Si,  Si)  and  t  5^  0  then  \p(t)/t  <  XF(0)  4-  8%. 
For  t  £  (0,  £1) 

XG(t)  <  XF(t)  +  A  ||  G  —  F  ||oo 

<  t(XF(0)  +  ^2)  +  A  ||  G  —  F  ||oo  • 

It  follows  that  if  —A  ||  G  -  F  ||oo  /(X'F( 0)  +  £2)  <  t  <  81  then  XG(f)  <  0.  Similarly,  if 
A  ||  G  -  F  Hoc  /(X'F(0)  +  82)  >  t  >  —81  then  XG(i)  >  0.  Since  XG(i)  is  continuous  in  t  for  each 
G  £  D,  there  is  a  solution  of  XG(t)  =  0  with  |  t  |  <  —A  ||  G  —  F  ||oo  /(XF(0)  4-  82)  if  ||  G  —  F  ||oo 
<  -<5x(X';,(0)  +  82)/ A.  | 

Proof  of  proposition  4-3.2.  Under  the  assumptions,  we  have  by  lemma  4.3.5  that,  for  j  —  1, . . .,  k, 

w - » /  *-  ■  H)i  /  *U)iF- 

We  also  have  for  arbitrary  G  £  D,  t  £  9i,  and  j  =  1, . . .,  k  —  1, 

=^Jip(x-  t)dG(x)  =  (-1 Y  J  ipU\x  -  t)dG. 


4.3.  Theory  for  M-estimates. 
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Using  these  facts  it  is  easy  to  show  that  the  formulas  (4.3.3) — (4.3.5)  hold. 

For  the  remainder  of  the  proof  assume  that  ||  G  —  F  ||oo  is  sufficiently  small  so  that  T(G) 
is  well  defined.  By  assumption  XF(0)  =  —  /  ip' dF  >  0.  Thus  we  may  show 


-\’F{0  )Rk(F;G-F) 


=  -X'f(0)|t(G)  -  T(F)  -  £  W G  -  F) j 


is  o(||  G  —  F  li^so),  0  <  i  <  fc  to  show  Frechet  differentiability.  For  t  =  0  the  result  follows  from 
lemma  ,4.3.6.  Recall  XF(0)  =  0,  Xg(T(G))  =  0.  From  (4.3.3)  we  have 


-\'f{0)RAF;  G-F)  =  -X'f(0 )T(G)  -  XG(0) 

=  XF(T(G))  -  X'F(0 )T{G) 

+  Xg(T(G))  +  XF(0)  -  XF(T(G))  -  \G( 0). 


(4.3.11) 


By  lemma  4.3.6,  T{G)  =  T(G)  -  T(F)  =  0(||  G  -  F  H^)  and  thus 

XF(T(G))  —  XF(0)T(G)  =  o(||  G  —  F  Hoc).  We  may  rewrite  the  last  line  of  (4.3.11)  as 

J^P(x  -  T(G ))  -  ^(*))d(G  -  F).  (4.3.12) 

Since  ip  is  uniformly  continuous  and  T(G)  =  0(||  G  —  F  ||oo)  this  may  be  written  as 

0(||  G  —  F  |U  /  d\G-F\=  0(||  G-F  HD- 

Applying  the  triangle  inequality  it  follows  that  T  is  one  time  Frechct  differentiable. 

We  continue  with  similar  proofs  for  second  and  third  differentials.  First  we  note  that  for 


k  >  1 


*) 


-X',(0 )Rk{F;  G  —  F)  =  -X'F(0)  T(G)  -  T[F)  -  £  W  G  -  F)/j\ 

\  i=i 

=  -\’F{0)Rk-i[Fi  G  —  F)  +  X'F(0 )Tk{F-,  G  -  F)/k\. 


From  (4.3.4),  (4.3.11)  and  (4.3.13)  we  have  (after  rearrangement) 

-X'F(0 )R2(F;  G  -  F)  =\f{T(G))  -  £  X^(0) 

i=i  3' 

-  (xc(0)  +  XF(T(G))  +  (X'G(0)  -  X'F(0 ))T(G\j 


(4.3.13) 


+ 


~\'P(0)^T(G))2-(TAF;G-F))^ 
(x'G(0)  -  X'f(0))(t(G)  -  T \(F;  G  -  F)J. 


(4.3.14) 
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4.  L-  and  M-estimates 


The  first  line  of  this  is  a  second  order  Taylor’s  series  expansion  and  is  o(||  G  —  F  [j^)  by  the  version 
of  Taylor’s  theorem  given  in  lemma  4.2.7. 

The  second  line  of  (4.3.14)  may  be  rewritten  as 

-  T(G))  -  rj>(x)  +  rp'(x)T(G)^d(G  -  F).  (4.3.15) 

Under  the  condition  that  ip'  is  continuous  everywhere  and  of  bounded  total  variation,  the  integrand 
may  be  bounded  uniformly  by  a  constant  times  ( T(G ))2.  Thus  (4.3.15)  can  be  shown  to  be 
0(||  G  —  F  H^,)  as  (4.3.12)  was  shown  to  be  0(||  G  —  F  (j^,). 

To  show  the  third  line  of  (4.3.14)  is  o(||  G  —  F  K^)  we  note  that  we  have  shown 

T(G)  =  T1(F-,G-F)  +  o(\\G-F\\00) 

and  thus 

(T(G))2  =  (Ti(F;  G  -  F))2  +  o(||  0  -  F  lU^F;  0  -  F)  +  o(||  0  -  F  ||«,). 

But  Ti(F;  G  —  F)  =  0(||  G  —  F  ||oo)  by  lemma  3.4.6. 

In  the  fourth  line  of  (4.3.14)  we  have 

X'G(0)  -  X'F(0)  =  -  f  iP'd{G  —  F)  —  0(||  G  -  F  |U) 

and  T(G)  —  Ti(F;  G  —  F)  =  o(||  G  —  F  ||oo)  which  implies  the  product  is  o(||  G  —  F  H^,). 

Applying  the  triangle  inequality  we  see  that  —  \'f(0)R2{F;  G  —  F)  =  o(||  G  —  F  ||^)  and  we 
have  shown  that  T  is  two  times  Frechet  differentiable  at  F. 


From  (4.3.5),  (4.3.13)  and  (4.3.14)  we  have  (after  rearrangement) 


-\'A0)R3(F;  G-F )  =XF(T(G))  -  ^r^(O) 

i- 1  J' 

-  (xG(  0)  +  Xjr(r(G))  +  x;  ^®(x^(0)  -  x«(o))) 

+  (x'G(0)  -  X'F(0)^r(G)  -  Ti(F ;  G-F)-  ±T2(F;  G  -  F)j 
+  |(>£(0)  -  X';(0)^(T(G))2  -  (Ti(F ;  G  -  F))2^ 

+  ^U0)((T(G))2  -  (Ti(F ;  G  -  F))2  -  Ti(F;  G  -  F)T2(F;  G  -  F )j 
+  ^X"'(0)^((r(G))3  -  {Ti(F ;  G  -  F1))3). 


(4.3.16) 

The  ideas  needed  to  show  that  each  line  of  the  right  hand  side  of  (4.3.16)  is  o(||  G  —  F  have 
already  been  given.  We  omit  the  details.  | 


4.3.  Theory  for  M-estimates. 
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Proof  of  proposition  J,.3.S.  We  have  shown  in  proposition  4.3.2  that  T(G)  is  well  defined  in  a 
neighborhood  of  F.  Let  G  be  a  non-decreasing  element  of  D .  As  t  -*■  oo,  XG(i)  =  /  ip{x  -  t)dG  <  0 
and  as  t  — ►  —  oo,  XG(t)  >  0.  Since  XG(t)  is  continuous  and  non- increasing  this  implies  that  there 
is  a  solution  T[G )  of  the  equation  XG(i)  =  0.  This  also  implies  XG(i)  >  0  for  all  t  <  T(G)  and 
XG(i)  <  0  for  all  t  >  T(G).  Since  p  is  convex  we  have 

J  p{x  -  t)dG[x)  >  J  P{x-  T(G))dG{x)  +  (T(G)  -  t)  J  ^{x  -  t)dG{x).  (4.3.17) 

This  holds  even  if  /  p(x  —  T(G))dG(x)  —  oo  since  p  is  non-negative  and  G  is  totally  positive.  The 
second  term  of  the  right  hand  side  of  (4.3.17)  is  greater  than  or  equal  to  zero,  and  thus  t  =  T(G) 
minimizes  /  p(x  —  t)dG[x). 


We  will  now  justify  the  application  of  theorem  3.4.1.  Proposition  4.3.2  gives  us  the  existence 
of  the  appropriate  differentials.  The  tail  condition  needed  on  F  is  assumed.  Let  A  be  such  that  if 
x  >  A  then  ■’p(x)  >  (1/2)  sup  ip(y)  =  B/ 2.  Let  e  =  min(l  —  F(A),  1/4).  Then 

j  ip(x  -  Xlmltn  +  A)dFn  >f-f  =  f>0 

which  implies  T(Fn)  >  -X'|£nj.„  -  A.  Similarly  T(Fn)  <  X|(1_£)nj+1:n  +  A.  Thus  |  T(Fn)  \  < 
p*fj£nj:n  |  +  |  £)n|+i:n  |  +2A  and  condition  ii  of  theorem  3.4.1  is  satisfied. 

We  may  now  apply  theorem  3.5.3.  Under  the  symmetry  assumptions  we  have  /  ip"dF  =  0 
and  thus  the  kernels  are  simplified  as  follows: 


hi(F;x) 


h‘i{F ;  xi,x2) 


h3(F;xi,x2,x3) 


if(x) 


f  tp'dF 

_  ^{xi)^'{x2)  +  tjj'{xi)if(x2) 

tfVdFf 


=  -V'(a:iMx2Mx3)^|^r 

ip(x1)%p(x2)tp"(x3)  4-  ip{xi)ip"(x2)rjj(x3)  +  rp"{xi)il>(x2)ip(x3) 

(/  tp'dF)3 

ij){xl)il)'{x2)'4>,{x3)  +  i})'{xi.  )ip{x2)ip'(x3)  +  tl)'{x{)tl)'{x2)il){x3) 

[SVdFf 


From  definition  3.3.1  and  theorem  3.5.3  the  calculations  to  show  the  desired  results  are  now  simple. 
We  give  one  example.  We  have  (recall  Z{  —  Sxt  —  F) 


Ti(F-,  ZJ  = 


iPjXj-fWF  _  ^(Xt) 

/  ip'dF  J  if>'dF’ 


T2(F ;  Zlt  Zi)  = 


^{XWjXt)  -  V>(XQ  /  if'dF  -  t/>'(Xx)  /  ipdF  +  /  tpdF  /  fdF 

UVdF)* 


so 


4.  L-  and  M-estimates 


Thus 


E  [TiiFiZtWFiZuZi)]  =  -2 


f^'dF 
(/  ip'dF)3 


,  2  S^dF 
UVdFf 


I 


Chapter  5 

Applications 


§5.1  Introduction. 

We  are  at  last  prepared  to  try  to  give  the  reader  a  feel  for  the  applications  and  limitations 
of  the  theory  we  have  presented  by  discussing  some  numerical  examples  and  counterexamples.  We 
do  not  make  any  sort  of  extensive  study  of  estimators  or  seriously  attempt  to  find  any  optimal 
estimators.  Work  in  this  direction  is  being  done  by  Eynon  (1982). 

In  section  2  we  present  some  valid  applications  of  our  theory.  We  demonstrate  using  Monte 
Carlo  studies  that  the  second  order  variance  approximations  yield  big  improvements  over  the  first 
order  approximations  in  some  cases.  Often  the  second  order  variance  approximation  is  ‘better’  than 
Monte  Carlo  approximation  because  the  amount  of  calculation  needed  to  obtain  the  same  degree  of 
accuracy  by  simulation  is  large. 

In  section  3,  we  give  some  initial  simulation  results  for  nonparametric  variance  estimates 
which  are  first  and  second  order  expansions  for  variances  evaluated  at  the  empirical  cdf. 

In  section  4  we  consider  the  median  and  trimmed  means.  These  statistics  do  not  satisfy 
the  Frechet  differentiability  conditions  of  theorem  3.4.1.  We  show  that  we  do  not  necessarily  obtain 
valid  variance  expansions  in  these  cases  by  taking  the  limit  of  expansions  for  statistics  which  do 
meet  the  differentiability  conditions  and  approach  the  desired  statistic. 


§5.2  Initial  examples. 

In  this  section  we  give  numerical  examples  of  approximations  which  apply  the  theory  of 
chapter  4  for  small  to  moderate  sample  sizes.  Using  simulation  we  show  that  in  the  cases  presented 
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Figure  1.  Graph  of  tf)a  as  defined  in  (5.2.1). 

these  approximations  are  quite  good.  Various  applications  are  suggested.  We  use  the  incomplete 
beta  function 

Up’,)=iWW)l 

to  define  our  estimators.  The  functions  defining  our  M-estimates  (recall  definition  4.3.1)  are  defined 
for  positive  a  by 

rJ(i+*/a)/2(3,3)  -  1/2,  if  |  x  |<  o, 

=  <  1/2,  if  X  >  a,  (5.2.1} 

.-1/2,  if  x  <  —a. 

Figure  1  presents  a  graph  of  ij>a. 

The  first  three  derivatives  of  ijia  are  piecewise  polynomial.  The  first  two  derivatives  of  ij)a 
are  continuous  everywhere,  and  the  third  derivative  docs  not  exist  at  ±a  and  is  continuous  elsewhere. 

The  three  distribution  functions  we  will  use  in  our  examples  are  the  standard  normal, 

F(x)=  —  /  e~v*f2dy,  (5.2.2) 

the  Cauchy 

F(x)  =  \  \  tan_1(a:), 


(5.2.3) 


5.2.  Initial  examples. 


53 


F(x) 


if 

\^ex,  ot] 


(5.2.4) 


and  the  Laplace 

x  >  0, 
otherwise. 

All  of  the  conditions  of  proposition  4.3.3  needed  to  obtain  first  and  second  order  variance  approxima¬ 
tions  are  easily  verified  for  any  positive  a  and  any  of  the  above  three  distribution  functions.  Although 
our  theory  does  apply  to  distributions  with  discrete  components,  we  do  not  give  any  examples  using 
such  distributions. 


For  each  of  our  examples  we  compare  first  and  second  order  variance  expansions  from 
proposition  4.3.3  (for  M-estimates)  or  proposition  4.2.6  (for  L-estimates)  with  variance  approxima¬ 
tions  obtained  by  simulation.  In  our  simulations  we  use  a  combination  of  a  linear  congruential 
and  Fibonacci  pseudo  random  number  generators  as  recommended  in  Knuth  (1969),  (see  pp.  9,  26, 
30)  to  generate  uniform  pseudo  random  numbers.  To  obtain  normal  pseudo  random  numbers  we 
use  the  Box-Muller  (1959)  transformation  in  combination  with  the  above  uniform  generator.  To 
obtain  Cauchy  and  Laplace  pseudo  random  variables  we  divide  normal  pseudo  random  variables  by 
independent  random  variables  having  the  appropriate  distributions.  See  Andrews,  et.  al.  (1972), 
pp.  56-57.  We  use  these  so  called  normal/independent  generators  so  that  we  may  use  the  variance 
reduction  techniques  described  in  Andrews  et.  al.  (1972),  and  (more  thoroughly)  in  Simon  (1976). 
For  the  L-estimatcs  we  present  we  use  precisely  these  variance  reduction  techniques.  For  M-estimates 
we  must  use  a  slightly  different  procedure  as  our  M-estimates  are  not  scale  invariant.  The  difference 
in  swindling  techniques  may  sometimes  cause  the  standard  error  of  simulation  approximations  for 
variances  of  M-estimates  to  be  slightly  larger  than  those  for  L-estimates. 

Table  1  presents  comparisons  of  approximations  done  by  simulation  and  by  the  expansions 
of  proposition  4.3.3.  Except  for  the  first  order  approximation  with  n  —  10  in  the  Cauchy  example 
all  of  the  expansion  approximations  appear  to  be  quite  good.  Although  we  have  not  given  (complete) 
theoretical  justification,  we  suspect  that  the  behavior  of  the  variance  expansions  can  be  described 
as 

nVar(T(F„))  =  a\  +  (l/n)<r2  +  (l/n2)<r2  +  o(n~2). 

Given  that  this  formula  is  valid  we  would  expect  that  the  error  in  the  first  order  approximation  is 
approximately  cr^jn  which  is  halved  as  n  doubles,  and  the  error  in  the  second  order  approximation  is 
approximately  erf/n2  which  is  quartered  as  n  doubles.  Because  of  the  standard  error  of  the  simulation 
approximations  it  is  difficult  to  tell  if  the  error  is  behaving  like  this  in  many  cases.  However,  the 
error  in  the  first  order  expansion  for  the  Cauchy  and  Laplace  examples  does  appear  to  halve  as  n 
doubles  and  for  the  Cauchy  example  it  appears  that  the  error  of  the  second  order  approximation  is 
quartered  as  n  doubles. 
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5.  Applications 


F,a 

Variance 

Approximation 

n  =  10 

n  =  20 

n  =  40 

Normal 
Distribution 
a  =  1 

Simulation  size 
Simulation  (Std.  Err.) 

1st  order 

2nd  order 

40,000 

1.197  (.002) 
1.208 

1.200 

20,000 

1.203  (.002) 
1.208 

1.204 

10,000 

1.203  (.003) 
1.208 

1.206 

Cauchy 
Distribution 
a  =  .6 

Simulation  size 
Simulation  (Std.  Err.) 

1st  order 

2nd  order 

160,000 

3.341  (.017) 
2.278 

2.959 

80,000 

2.714  (.012) 
2.278 

2.619 

40,000 

2.472  (.012) 
2.278 

2.449 

Laplace 
Distribution 
a  =  1.5 

Simulation  size 
Simulation  (Std.  Err.) 

1st  order 

2nd  order 

160,000 

1.419  (.004) 
1.266 

1.409 

80,000 

1.335  (.005) 
1.266 

1.338 

40,000 

1.295  (.007) 
1.266 

1.302 

Table  1.  Variance  approximations  for  M-estimates  with  tp  =  ij>a. 

The  first  column  of  table  1  gives  the  distribution  for  which  estimates  of  location  are  being 
made  as  well  as  the  parameter  a  of  this  M-estimate.  The  column  labelled  ‘Variance  approximation’ 
gives  a  brief  description  of  the  values  presented  in  each  row.  The  last  three  columns  are  headed  by 
the  sample  size  for  the  estimates  considered.  The  first  row  of  each  section  presents  the  simulation 
size  used  for  the  Monte  Carlo  approximation  of  n  times  the  variance  given  in  the  second  row.  The 
standard  error  of  this  approximation  is  given  in  parentheses.  The  third  and  fourth  rows  give  the  first 
and  second  order  approximations,  respectively,  of  n  times  the  variance  obtained  from  proposition 

4.3.3. 


For  a  given  distribution  function  F  and  positive  a  we  will  consider  an  L-estimate  as  defined 


in  definition  4.2.6  with 


!VadF 


(5.2.5) 


where  ip’a  is  the  derivative  of  Tpa  in  (5.2.1).  Figure  2  presents  a  graph  of  this  function  when  F  is  the 
standard  normal  distribution.  „ 


5.2.  Initial  examples. 


55 


Figure  2.  Graph  of  Jaj.-  as  defined  in  (5.2.5)  with  a  =  1.3  and  F  —standard  normal  cdf. 


Note  that  the  shape  of  Ja,F  changes  as  a  and  F  change.  For  F  as  in  (5.2.2) — (5.2.4) 
it  is  easily  verified  that  we  can  apply  proposition  4.2.6  to  obtain  first  and  second  order  variance 
approximations  of  L-estimates  corresponding  to  (5.2.5).  From  (4.2.4)and  (4.3.6)  we  see  that  this 
L-estimate  has  the  same  influence  curve  as  the  M-estimate  corresponding  to  ipa.  This  implies  that 
the  M-  and  L-estimates  corresponding  to  ipa  have  the  same  first  order  variance  approximation  (recall 
(3.5.10)  and  the  fact  that  Ti{F;  Z\)  =  hi[F',Xi)—  E[/ii(FT;Xi)j,  where  hi(F;  •)  is  the  influence  curve). 


Table  2  shows  results  similar  to  those  given  in  table  1.  Note  that  the  second  order  term  can 
be  important  for  comparing  variances  of  corresponding  M-  and  L-estimates,  especially  for  the  Cauchy 
distribution.  In  comparing  the  normal  examples  of  tables  1  and  2  we  see  first  order  asymptotically 
equivalent  estimators  where  the  variance  for  n  =10,  20,40  is  smaller  for  the  L-estimate  than  the  M- 
estimate.  In  all  other  examples  given  in  this  section  the  M-estimate  has  outperformed  its  counterpart. 
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5.  Applications 

F,a 

Variance 

Approximation 

n  =  10 

n  —  20 

n  =  40 

Normal 
Distribution 
a  —  1 

Simulation  size 
Simulation  (Std.  Err.) 

1st  order 

2nd  order 

40, 000 

1.181  (.001) 
1.208 

1.182 

20,000 

1.195  (.002) 
1.208 

1.195 

10,000 

1.198  (.003) 
1.208 

1.202 

Cauchy 
Distribution 
a  =  .6 

Simulation  size 
Simulation  (Std.  Err.) 

1st  order 

2nd  order 

160, 000 

3.404  (.016) 
2.278 

3.006 

80,000 

2.743  (.012) 
2.278 

2.642 

40,000 

2.484  (.012) 
2.278 

2.460 

Laplace 
Distribution 
a  =  1.5 

Simulation  size 
Simulation  (Std.  Err.) 

1st  order 

2nd  order 

160, 000 

1.448  (.003) 
1.266 

1.456 

80,000 

1.356  (.005) 
1.266 

1.361 

40,000 

1.307  (.007) 
1.266 

1.314 

Table  2.  Variance  approximations  for  L-estimates  with  J  =  Ja,F- 


Table  2  is  arranged  as  table  1.  The  first  column  gives  the  distribution  for  which  estimates 
of  location  are  being  made  as  well  as  the  parameter  a  of  this  L-estimate.  The  column  labelled 
‘Variance  approximation’  gives  a  brief  description  of  the  values  presented  in  each  row.  The  last 
three  columns  are  headed  by  the  sample  size  for  the  estimates  considered.  The  first  row  of  each 
section  presents  the  simulation  size  used  for  the  Monte  Carlo  approximation  of  n  times  the  variance 
given  in  the  second  row.  The  standard  error  of  this  approximation  is  given  in  parentheses.  The 
third  and  fourth  rows  give  the  first  and  second  order  approximations,  respectively,  of  n  times  the 
variance  obtained  from  proposition  4.2.6. 


When  one  wishes  to  compare  variances  of  estimators  from  a  class  such  as  that  described 
by  the  M-estimates  corresponding  to  (5.2.1)  or  the  L-estimates  of  (5.2.5)  the  expansions  presented 
here  can  be  particularly  useful.  Using  simulation  results  to  compare  more  than  a  few  members  of 
such  a  class  would  require  an  exorbitant  amount  of  computer  time.  The  variance  approximations 
of  propositions  4.2.6  and  4.3.3  take  little  computing  time  in  comparison.  Figures  3  and  4  present 
variance  approximations  for  M-estimates  corresponding  to  (5.2.1)  and  L-estimates  corresponding  to 
(5.2.5)  when  the  underlying  distribution  is  the  Cauchy  distribution  given  in  (5.2.3). 


5.2.  Initial  examples. 
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Figures  3  and  4  present  variance  approximations  for  M-estimates  corresponding  to  (5.2.1) 
and  L-estimates  corresponding  to  (5.2.5)  when  the  underlying  distribution  is  the  Cauchy  distribution 
given  in  (5.2.3).  The  lower  line  gives  the  first  order  variance  approximations  for  both  estimates  as 
a  function  of  a.  The  middle  and  upper  lines  give  the  second  order  approximation  for  M-estimates 
and  L-estimates,  respectively.  The  M’s  and  L’s  plotted  are  simulation  approximations  for  M-  and 
L-estimates,  respectively .  The  standard  errors  of  the  simulation  results  are  about  .03  except  for  the 
L-estimates  corresponding  to  n  =  10  and  the  two  largest  values  of  a  where  the  standard  errors  are 
.08  and  .15,  respectively.  Note  that  the  scales  on  the  y- axes  of  the  two  graphs  differ.  8 
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Figure  4 ■  Variance  approximations  for  L-  and  M-estimates;  n=20. 

There  are  several  things  worth  noting  in  these  figures.  First,  the  error  in  the  second 
order  approximation  is  about  one  fourth  as  large  for  n  —  20  as  for  n  =  10.  The  second  order 
approximation  is  much  improved  over  the  first  order  approximation.  In  choosing  an  optimum  value 
of  a  it  is  clearly  important  to  consider  the  second  order  approximation  rather  than  just  the  first 
order  approximation.  Although  the  second  order  curves  cross,  it  appears  that  for  n  —  10  or  20  there 
is  no  value  of  a  for  which  the  L-cstimate  is  better  than  the  M-estimate.  Finally,  for  L-estimatcs  the 
approximation  gets  worse  as  a  increases,  but  for  M-estimates  this  is  not  the  case.  This  is  undoubtedly- 
related  to  the  fact  that  for  a  >  F'~1(l  —  2/n)  the  second  moment  of  the  L-estimate  is  infinite. 

Hodges  and  Lehmann  (1970)  discuss  the  comparison  of  variances  of  estimators  which  have 
the  same  first  order  efficiency.  Assuming 

nVar/,-(2’(1)(/<’n))  =  a\  +  (l/n)<7^  +  o(l/n) 

and 

*  raVarF(T(2)(Fn))  =  a\  +  (l/n)a|2  +  o(l/ra) 

where  cr22  >  C21,  they  suggest  a  measure  called  deficiency  (or  asymptotic  expected  deficiency) 
defined  by 

2  2 

J  <t22  ~  "21 
d=  - ^ - * 


(5.2.6) 


5.2.  Initial  examples. 
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They  note  that  as  n  becomes  large,  the  number  of  additional  observations  dn  needed  to  make 

nVarF(r(2>(Fn+d  J)  =  Var  F(T^{Fn))  +  o(l/n2) 

tends  to  d.  As  an  example  we  let  T^(Fn)  be  an  M-estimate  with  ip  =  ipa  and  let  T^(Fn)  be  an 
L-estimate  with  J  —  Ja>p  where  F  is  the  Cauchy  distribution  of  (5.2.3)  and  a  =  1.8.  The  first 
coefficient  of  the  variance  approximation  is  2.6  and  the  second  coefficients  are  5.2  (M)  and  11.0  (L). 
This  and  (5.2.6)  suggest  that  approximately  d  =  (11.0  —  5.2)/2.6  =  2.2  additional  observations  are 
needed  to  get  the  same  variance  for  the  L-estimate  as  for  the  M-estimate  as  n  becomes  large.  Table 
3  indicates  that  for  this  example  n  must  be  moderately  large  before  the  variances  of  T^l\Fn)  and 
2,(2)(i'’n+[dl)  become  very  close.  Note  that  even  the  lines  labelled  ‘2nd  order’  are  not  that  close. 
For  most  of  the  other  examples  considered  in  this  section  it  appears  that  the  asymptotic  expected 
deficiency  is  less  than  1. 


Variance 

Approximation 


M-estimate 

Sample  size 

n  =  8 

n  —  18 

n  =  38 

Simulation  size 

8,000 

4,000 

2,000 

Simulation  (Std.  Err.) 

3.831  (.089) 

2.974  (.055) 

2.794  (.062) 

1st  order 

2.608  ♦ 

2.608 

2.608 

2nd  order 

3.262 

2.899 

2.746 

L-estimate 

Sample  size 

n  =  10 

n  =  20 

n  =  40 

Simulation  size 

50, 000 

25, 000 

12, 500 

Simulation  (Std.  Err.) 

4.857  (.146) 

3.339(.029) 

2.918  (.027) 

1st  order 

2.608 

2.608 

2.608 

2nd  order 

3.705 

3.156 

2.882  ' 

Table  3.  Deficiency  example:  Cauchy  distribution,  a  —  1.8 

Table  3  is  arranged  somewhat  differently  than  tables  1  and  2.  The  M-estimate  considered 
has  the  ip  function  given  by  (5.2.1).  The  L-estimate  considered  has  the  J  function  given  by  (5.2.5). 
The  column  labelled  ‘Variance  approximation’  gives  a  brief  description  of  the  values  presented  in 
each  row.  The  first  row  of  each  section  presents  the  sample  size  for  the  estimate  of  interest.  The 
second  row  of  each  section  presents  the  simulation  size  used  for  the  Monte  Carlo  approximation  of 
n  times  the  variance  given  in  the  third  row.  The  standard  error  of  this  approximation  is  given  in 
parentheses.  The  fourth  and  fifth  rows  give  the  first  and  second  order  approximations,  respectively, 
of  n  times  the  variance  obtained  from  propositions  4.2.6  and  4.3.3. 
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Variance 

F,  a  Approximation  n  —  10  n  =  20  n  —  40 


Normal 

Simulation  size 

Distribution 

Simulation  (Std.  Err.) 

a  —  1 

1st  order 

2nd  order 

Cauchy 

Simulation  size 

Distribution 

Simulation  (Std.  Err.) 

a  =  .6 

1st  order 

2nd  order 

Laplace 

Simulation  size 

Distribution 

Simulation  (Std.  Err.) 

a  =  1.5 

1st  order 

2nd  order 

40,000 

20,000 

10,000 

1.164  (.001) 

1.183  (.002) 

1.192  (.003) 

1.208 

1.208 

1.208 

1.182 

1.195 

1.202 

160,000 

80,000 

40,000 

3.411  (.016) 

2.744  (.012) 

2.485  (.012) 

2.278 

2.278 

2.278 

3.006 

2.642 

2.460 

160, 000 

80, 000 

40,000 

1.476  (.003) 

1.373  (.005) 

1.316  (.007) 

1.266 

1.266 

1.266 

1.456 

1.361 

1.314 

Table  4 -  Variance  approximations  for  L-estimates  with  J(i/(n  +  1))  coefficients;  J  =  Ja,F- 

Table  4  is  the  same  as  table  2  except  that  the  L-estimates  simulated  use  coefficients 
J(i/(n  +  1))  instead  of  /(,_!)/„  J(u)du,  i  =  1,2,.. .,  n.  We  have  normalized  these  coefficients  so  that 
they  sum  to  one.  The  table  is  arranged  as  tables  1  and  2  are.  The  first  column  gives  the  distribution 
for  which  estimates  of  location  are  being  made  as  well  as  the  parameter  a  of  this  L-estimate.  The 
column  labelled  ‘Variance  approximation’  gives  a  brief  description  of  the  values  presented  in  each 
row.  The  last  three  columns  are  headed  by  the  sample  size  for  the  estimates  considered.  The 
first  row  of  each  section  presents  the  simulation  size  used  for  the  Monte  Carlo  approximation  of  n 
times  the  variance  given  in  the  second  row.  The  standard  error  of  this  approximation  is  given  in 
parentheses.  The  third  and  fourth  rows  give  the  first  and  second  order  approximations,  respectively, 
of  n  times  the  variance  obtained  from  proposition  4.2.6. 

The  first  thing  to  note  in  table  4  is  that,  as  before,  all  expansions  except  for  the  Cauchy 
distribution  with  n  —  10  appear  quite  good.  The  error  of  the  first  order  approximation  always 
appears  to  halve  as  n  doubles.  The  error  of  the  second  order  approximation  for  the  normal 
distribution  goes  down  by  a  factor  of  two  as  n  quadruples.  For  the  Cauchy  distribution  however, 
this  error  appears  to  quarter  as  n  doubles,  as  before.  Because  of  the  standard  error  of  the  Monte 
Carlo  approximation,  the  error  behavior  for  the  Laplace  example  is  unclear.  Whether  or  not  the 
second  order  approximation  is  a  correct  one  when  coefficients  for  an  L-estimate  are  computed  in 
this  fashion  is  not  readily  apparent  from  these  examples.  We  have  not  attempted  to  justify  this 
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expansion  theoretically. 

§5.3  Nonparametric  variance  estimates. 

Since  it  is  rarely  the  case  that  the  underlying  distribution  function  is  known,  we  wish  to 
give  a  brief  example  indicating  that  these  expansions  may  be  useful  in  approximating  variances  if 
we  substitute  the  empirical  distribution  function  in  our  formulas. 

The  variance  of  a  functional  T(Fn)  can  be  considered  a  functional  of  the  underlying 
distribution  function  F,  namely 

ff2(n,  F)  =  Var p(T[Fn)).  (5.3.1) 

In  chapter  4  we  have  given  formulas  approximating  a2(n,F)  by  an  expression  of  the  form 

a\n,  F)  =  l-a\{F)  +  ±a\(F)  +  o(l /n2)'.  (5.3.2) 

71  71 

We  briefly  consider  the  nonparametric  variance  approximations 

na2(n,F)  «  <r\(Fn)  (5.3.3) 

and 

n*2(n,  F)  «  <r\(Fn)  +  \fl(Fn).  (5.3.4) 

These  approximations  are  ‘delta  method’  approximations  and  can  also  be  considered  as  first  and 
second  order  approximations  of  the  bootstrap  estimate  of  variance,  namely  °\n,Fn). 

Since  the  standard  deviation  of  a2(Fn),  i  =  1,2,...  is,  in  general,  0(1/ y/n)  one  might 
expect  that  the  second  (or  any  higher  order  term)  of  (5.3.2)  would  be  useless.  The  reason  we  have 
considered  this  term  is  that  it  provides  a  second  order  approximation  of  the  bootstrap  which  Efron 
(1981)  has  noted  can  be  a  better  approximation  than  the  first  order  delta  method  (note:  Efron 
usually  refers  to  the  first  order  delta  method  as  the  infinitesimal  jackknife). 

As  an  example  of  the  type  of  calculation  to  be  done  we  recall  the  first  order  variance 
approximation  of  (4.2.7)  and  note  that  to  obtain  the  right  hand  side  of  (5.3.3)^  we  compute 

J  J ^«(min(xi,  x2))  -  Fn(xi)Fn(x2)^J(Fn(x1))J(Fn(x2))dx1dx2 

n— 1 

=  ]T(Xi+1:n  -Xi:n)2(i/«)(1  -  i/n){j(i/n))2 

i=l 

n— 2  n— 1 

+  2  £  (X,+l!n  -  Xi;n)(i/n)J(i/n)  £  {Xj+Un  -  Xj:n)(l  -  j/n)J(j/n). 

t'=i  i=*+i 
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F ,  a 

Variance 

Approximation 

n  =  10 

n  =  20 

n  =  40 

Normal 

Simulation  size 

40,000 

20,000 

10,000 

Distribution 

Simulation  (Std.  Err.) 

1.181  (.001) 

1.195  (.002) 

1.198  (.003) 

a  =  1 

1st  order 

1.208 

1.208 

1.208 

2nd  order 

1.182 

1.195 

1.202 

1st  delta  (Std.  Dev.) 

1.377  (1.008) 

1.298  (.675) 

1.260  (.466) 

2nd  delta  (Std.  Dev.) 

1.364  (.779) 

1.216  (.559) 

1.242  (.426) 

1st  delta  CV 

.74 

.35 

.17 

2nd  delta  CV 

.44 

.26 

.15 

CV  bound 

.22 

.11 

.05 

Cauchy 

Simulation  size 

160, 000 

80, 000 

40,000 

Distribution 

Simulation  (Std.  Err.) 

3.404  (.016) 

2.743  (.012) 

2.484  (.012) 

a  —  .6 

1st  order 

2.278 

2.278 

2.278 

2nd  order 

3.006 

2.642 

2.460 

1st  delta  (Std.  Dev.) 

5.553  (13.910) 

3.342  (3.437) 

2.759  (1.777) 

2nd  delta  (Std.  Dev.) 

-.057  (30.913) 

4.114  (5.394) 

2.466  (1.496) 

Laplace 

Simulation  size 

160,000 

80,000 

40,000 

Distribution 

Simulation  (Std.  Err.) 

1.448  (.003) 

1.356  (.005) 

1.307  (.007) 

a  =  1.5 

1st  order 

1.266 

1.266 

1.266 

2nd  order 

1.456 

1.361 

1.314 

1st  delta  (Std.  Dev.) 

1.765  (1.381) 

1.510  (.835) 

1.387  (.541) 

2nd  delta  (Std.  Dev.) 

1.479  (1.135) 

1.505  (.791) 

1.431  (.535) 

Table  5. 

Nonparametric  variance 

approximations  for  L-estimates  with  J 

=  Ja,F • 

Table  5  is  an  expanded  version  of  table  2.  The  first  column  gives  the  distribution  for  which 
estimates  of  location  are  being  made  as  well  as  the  parameter  a  of  this  L- estimate.  The  column 
labelled  ‘Variance  approximation’  gives  a  brief  description  of  the  values  presented  in  each  row.  The 
last  three  columns  are  headed  by  the  sample  size  for  the  estimates  considered.  The  first  row  of  each 
section  presents  the  simulation  size  used  for  the  Monte  Carlo  approximation  of  n  times  the  variance 
given  in  the  second  row.  The  standard  error  of  this  approximation  is  given  in  parentheses.  The  third 
and  fourth  rows  give  the  first  and  second  order  approximations,  respectively,  of  n  times  the  variance 
obtained  from  proposition  4.3.3.  The  rows  labeled  ‘1st  delta’  give  the  average  value  of  v\{Fn)  as  in 
(5.3.3)  from  the  simulation.  The  rows  labeled  ‘2nd  delta’  give  the  average  value  of  the  right  hand 
side  of  (5.3.4).  Included  in  these  rows  are  the  estimated  standard  deviations  of  these  estimators.  It 
can  be  argued  that  a  lower  bound  for  the  coefficient  of  variation  of  any  location  and  scale  invariant 
scale  estimate  for  normal  observations  is  that  of  a2,  namely  2/(n—  1).  This  value  is  given  in  the 
row  labeled  ‘CV  bound’.  The  estimated  coefficients  of  variation  for  the  normal  case  arc  labelled  ‘1st 
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delta  CV’  and  ‘2nd  delta  CV\ 

For  the  Cauchy  distribution  these  approximations  appear  to  be  quite  poor.  For  the  nor¬ 
mal  and  Laplace  distributions  the  approximations  appear  reasonably  good  with  the  second  order 
approximation  appearing  to  have  both  lower  bias  and  lower  variance  than  the  first  order  approxima¬ 
tion;  the  only  exception  to  this  is  that  the  bias  is  higher  for  the  second  order  delta  method  for  the 
Laplace  distribution  with  n  —  40. 

It  appears  that  further  study  of  second  and  higher  order  delta  method  approximations 
might  be  worthwhile.  From  table  5  it  appears  that  an  important  part  of  such  a  study  would  be  to 
formulate  estimates  of  the  variation  of  such  approximations. 


§5.4  Quantiles  and  trimmed  means. 


In  this  section  we  present  a  pair  of  ‘non-applications’  of  the  theory  we  have  developed.  One 
might  hope  that  the  moment  expansion  of  the  limit  of  a  set  of  estimators  is  the  same  as  the  limit  of  the 
moment  expansions  of  these  estimators.  If  this  were  true  one  could  obtain  variance  approximations 
for  trimmed  means,  quantiles,  and  other  estimators  which  do  not  satisfy  the  assumptions  of  the 
moment  convergence  propositions  that  we  have  given.  We  give  examples  where  the  limit  of  the 
variance  approximations  of  estimators  is  not  equal  to  the  variance  expansion  of  the  limit  of  the 
estimates.  It  will  be  seen,  however,  that  we  may  improve  a  variance  approximation  substantially  by 
using  an  (incorrect)  expansion  developed  by  taking  the  limit  of  expansions. 

There  are  many  simple  functionals  which  are  not  Frechet  differentiable.  Quantiles  and 
linear  combinations  of  quantiles  are  among  these.  We  show  this  for  a  particular  case  as  an  example 
of  the  type  of  problem  that  may  arise  with  a  ‘well  behaved’  functional. 


Let  T(F)  —  inf{q  :  F(q)  >  c}  where  c  G  (0, 1  ),F  G  V.  Note  that  T(F)  —  oo  if  the  defining 
set  is  empty.  Let  F(x )  =  x  on  [0, 1].  We  shall  show  that  T  is  not  Frechet  differentiable  at  F.  For 
X  >  0,  y  G  3?,  let  F\iV  =  F  +  \(Sy  —  F).  For  y  fixed  and  X  sufficiently  small  we  have  for  y  ^  c 
—  (c  —  X6s(c))/(1  —  X).  The  Gateaux  differential  of  T  at  F  in  the  direction  of  Sy  —  F  is  by 
definition 


lim 

x->o 


C  -  Sy{c). 


T(Fx,y)-T(F)  =  X(c-^(c)) 

X  X— *o  X(l-X) 

If  the  Frechet  differential  exists,  clearly  it  must  be  equal  to  the  Gateaux  differential.  Thus  to 
show  that  T  is  not  Frechet  differentiable  at  F  it  suffices  to  show  that  for  some  e  >  0  and  any 
S  >  0  there  exists  Fx,y  such  that  j|  Fx,v  —  F  ||oo  <  S  and  |T(F'X)y)  -  T(F)  -  X(c  -  5y(c))| 
>  €  ||  F\<y  —  F  ||oo-  Fix  e  G  (0,c).  Let  S  G  (0, 1  —  c)  be  arbitrary.  Let  X  <  6.  If  0  <  y  <  1 
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then  ||  F\iV  -  F  Hoc  =  y  +  X(1  -  y)  —  y  =  X(1  -  y)  <  S.  Suppose  c  <  y  <  c/(  1  -  X).  Then 
T{F\ty)  —  y  and 

T{Fx,y)  -  T{F)  -  X(c  -  <9„(c))  =  y-c-\c. 


Since  for  each  X  >  0 


inf 

c<y<e/(  1— X) 


y  —  c  —  \c 


=  —  c  <  -e 


it  follows  that  T  is  not  Frechet  differentiable  at  F . 


Since  a  quantile  is  not  Frechet  differentiable  we  cannot  apply  theorem  3.4.1  to  approximate 
moments.  We  now  try  to  find  a  second  order  variance  approximation  for  the  median  by  taking 
the  limit  of  expansions  of  M-estimates  which  approach  the  median.  Let  ip  be  a  continuous  non¬ 
decreasing,  non-constant,  odd  function  on  3?  such  that  for  x  >  1,  ip(x)  —  1/2.  Assume  also  that 
ip'  and  ip"  exist  everywhere  and  that  ip"1  exists  and  is  bounded  everywhere  except  possibly  at  ±1. 
For  any  positive  a  let  ipa{x)  ==  ip(x/a).  From  proposition  4.3.3  we  know  that  if  F  is  symmetric  and 
differentiable  at  a  and  satisfies  the  necessary  tail  conditions  then  the  variance  of  the  M-estimate 
corresponding  to  ipa  for  a  sample  of  size  n  from  F  may  be  written  as  in  (4.3.10).  Letting  a  — ►  0  it 
can  be  shown  that  if  F  is  symmetric  and  three  times  differentiable  at  0  then  this  is 


(n  3«2)4(/(0))2  16n^(/(0))5  °^n  ^ 

provided  /( 0)  >  0.  David  (1980),  p.  81  gives  an  expansion  for  the  variance  of  a  quantile.  In  the 
case  of  the  median  with  F  symmetric  the  formula  he  presents  reduces  to 


( 1  2\  1  /"( 0)  , 

(n  n2/4(/(0))2  16n2(/(0))5  +  °(n  (5'4‘2) 

In  this  case  as  a  —*  0  the  estimators  corresponding  to  ipa  converge  to  the  median.  Whereas  the 
moment  expansions  of  (5.4.1)  and  (5.4.2)  are  not  the  same,  they  are  very  similar.  It  appears  that 
there  is  an  ‘extra  term’  when  we  do  the  calculation  to  obtain  (5.4.1). 


The  trimmed  mean  is  another  case  where  we  might  try  to  get  a  variance  approximation 
by  taking  the  limit  of  variance  approximations  of  statistics  which  approach  the  trimmed  mean.  In 
this  case  we  will  have  three  times  Frechet  differentiable  functionals  approaching  a  trimmed  mean 
which  is  one  time  Frechet  differentiable.  Thus  we  have  a  ‘smoother’  situation  than  with  the  median 
above  where  the  limiting  functional  was  not  Frechet  differentiable.  The  approximation  obtained  in 
this  fashion  appears  likely,  once  again,  to  be  incorrect. 


The  a- trimmed  mean  is  an  L-statistic  with  weight  function 

,W  =  L=fc'  »«<“<!-«. 

(0,  otherwise. 


(5.4.3) 


5.4.  Quantiles  and  trimmed  means. 
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Because  J  is  not  differentiable  at  a  and  1  —  a  we  may  not  apply  proposition  4.2.6  to  obtain  second 
order  variance  approximations.  A  smoothed  version  of  this  weight  function  is  given  by 


/(«)  = 


1 

1— 2a  » 

)> 

r^w(iz7=!i). 

.0, 


ifa+e<u<l  —  a  —  e, 
if  a  —  e  <  u  <  a  +  e, 

ifl  —  a  —  £<u<  1  —  a  +  e, 
otherwise, 


(5.4.4) 


where  w  is  defined  on  [—1, 1]  and  has  the  following  properties:  1)  u>(— 1)  =  0,  iu(l)  =  1;  2)  w  is 
three  times  continuously  differentiable  on  [-1, 1]  with  «/(— 1)  =  u/(l)  =  0;  3)  w  is  symmetric  about 
0,  i.e.  w(u)  =  1  —  w(—u). 


Property  3  implies  fS1w(x)dx  =  1.  This  implies  that  if  /  is  defined  as  in  (5.4.4)  then 
So  J[u)du  —  1.  The  L-statistic  corresponding  to  this  weight  function  will  be  referred  to  as  an  e- 
smoothed,  a-trimmed  mean.  For  any  e-smoothed,  a-trimmed  mean  we  have  J  and  jW  continuous 
and  bounded  on  [0, 1],  and  continuous  and  bounded  except  possibly  at  a  ±  e  and  1  —  a  ±  e.  It 
follows  that  if 


0  <  lim  inf 
*—♦00 


-  log(l  -  F(x)) 
logs 


0  <  lim  inf 

X— *-oo 


-log^-a:) 

log* 


(5.4.5) 


and  either  F  1  is  continuous  at  a  ±  e  and  1  —  a  ±  e  or  w"{—  1)  =  w"{\)  —  0  then  we  may  apply 
proposition  4.2.6  to  obtain  bias  and  first  and  second  order  variance  approximations. 

We  consider  the  case  where  F  is  symmetric  about  zero,  is  two  times  differentiable  at  F~l(a), 
and  satisfies  (5.4.5).  We  let  6  >  0  be  such  that  /(*)  =  j^F[x)  exists  in  ( F~l(a  —  5),F’-1(a  +  5)). 
For  any  e  6  (0, 6)  it  follows  that  F—1  is  continuous  at  o±e,  l-a±e  and  we  may  apply  proposition 
4.2.6.  For  the  first  order  variance  term  we  will  apply  (3.5.9)  rather  than  (4.2.7).  Note  that  T(F)  =  0. 
From  (4.2.4)  and  definition  3.3.1  we  can  show  that  if  z  =  Sx  —  F  then 


W*)H 


f*/(l  -2a), 

Ki(f-‘<i 


-«-o+ 


if  |*|<  /,-1(l  —  a  —  e), 
if  |  x  |>  F~~l(l  —  a  +  e), 

otherwise. 

(5.4.6) 


This  implies  that  the  first  order  variance  approximation  has  coefficient 
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E[(  Tl{F-,Zl)f] 


+ 


+ 


+ 


(1  -  2a)2 

2( 


X 


F~l(l-a-e) 


2(a  +  £)  /P-l/ 
(1  -2a)2' 


**"■(*) +  —  «))" 


«-e)(  [F~{a+£)  ( 
—  2a)2  \  J F-* (a— e) 


F(y)  -  ot' 


4  (a  —  e)F  1(1  —  a  — 


«^1  r'^'jF W-aN 

JF-i(a-e)  V.  e  / 


(1  -  2a)2 

2  /•*’-1(“+e)  /  /•* 


(1  -  2a)2 


JF-'(a-e)  yJF-'la-e)  \  £  )  ) 


rf"‘(a+«)  rF-'ia+e) 


(1-2  a) 


I  x2dF(x)  + 

Jo 


2a 


(l-2a)s 


(/-‘(l -a))«  + 0(e). 


(5.4.7) 


Applying  (4.2.9),  (5.4.4)  and  the  properties  of  w,  a  series  of  straightforward  calculations  shows  that 
the  following  is  the  coefficient  of  1/n2  in  the  second  order  variance  approximation: 


■(X-.,—,  FW(i  -  ^ 

+ (I'»r  °)  C  (“+'V  -  f (*»)xi  -  sr(»»)K(f(ii<)~  °)<t«.ifai 
+ ;  -  «(^> 


.(r‘(“+ 

WF-i(a-€ 


1  (a+c)  ✓  \  _  , 

(F(x))2(1  -  F(x))w"(  ^ .  ]dx 

-«) 


l 


e  /  ^xi 

(5.4.8) 

Further  straightforward  calculations  show  that  if  F  is  two  times  differentiable  at  F~l(a)  then  this 
is  equal  to 


2a(l-a)F-Hl-a)f  f(F^(a))  _  \  <*2  , 

(1  -  2a)2/(F~i(a))  V  (/(^H**)))2  V  U  -  2a)(/(F-i(a)))2  + 


e  ->  0.  (5.4.9) 


5.4.  Quantiles  and  trimmed  means. 
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Equations  (5.4.7)  and  (5.4.9)  suggest  the  following  approximation  for  the  a-trimmed  mean  when  F 
is  symmetric  and  two  times  differentiable  at  /^(a)  and  (5.4.5)  holds: 


1  2  ( [*'  1(1"a) 
n  (1  —  2a)2  yJo 


x2dF(x)  +  a(.F— *(1  -  a))2 


) 


1  (2a(l-a)F-'(l- a) (  f'(F~'(«))  _ 
n2  V  (1  -  2 a)2/(F-1(a))  V  (/(^(a)))2  J  + 


_ q2 

(l-2a)(/(/’-i(a)))2>/- 


(5.4.10) 


Distribution 

%trim 

n 

Exact 

1st  order  (Err.) 

2nd  order  (Err.) 

Monte  Carlo 
Approx.  (Err.) 

Normal 

10 

5 

1.019 

1.060  (.041) 

1.031(.012) 

1.020(.001) 

10 

1.053 

1.060  (.007) 

1.046  (-.007) 

1.048  (-.005) 

20 

1.055 

1.060  (.005) 

1.053  (-.002) 

1.056  (.001) 

25 

5 

1.145 

1.195  (.050) 

1.144  (-.001) 

1.156  (.011) 

10 

1.164 

1.195  (.031) 

1.170  (.006) 

1.148  (-.016) 

20 

1.186 

1.195  (.009) 

1.182  (-.004) 

1.199  (.013) 

Laplace 

10 

5 

1.758 

1.494  (-.264) 

1.825  (.067) 

10 

1.617 

1.494  (-.123) 

1.659  (.042) 

20 

1.556 

1.494  (-.062) 

1.577  (.021) 

1.60  (.04) 

25 

5 

1.599 

1.227  (-.372) 

1.766  (.167) 

10 

1.424 

1.227  (-.197) 

1.497  (.073) 

20 

1.228 

1.227  (-.001) 

1.362  (.134) 

1.33  (.10) 

Cauchy 

10 

20 

8.282 

4.771  (-3.511) 

6.78  (-1.50) 

7.3  (-1.0) 

40 

4.771 

5.77 

5.40 

25 

10 

4.498 

2.546  (-1.952) 

3.58  (-.92) 

4.6  (.1) 

20 

3.182 

2.546  (-.636) 

3.06(-.12) 

3.1(-.l) 

40 

2.546 

2.80 

2.61 

Table  6.  Variance  approximations  for  trimmed  means. 


Table  6  contains  exact  values  and  various  approximations  for  n  times  the  variance  of 
various  trimmed  means.  We  have  used  the  approximation  in  (5.4.7)  and  (5.4.10)  to  compute  the 
approximations  of  n  times  the  variance  given  in  the  columns  labelled  ‘lsfc  order’  and  ‘2nd  order’, 
respectively.  Some  of  the  exact  numbers  were  found  in  Gastwirtli  and  Cohen  (1970).  Other  exact 
variances  were  computed  using  tables  of  variances  and  covariances  of  order  statistics.  These  tables 
are  given  by  Sarhan  and  Greenberg  (1964)  (normal),  Govindarajulu  (1966)  (Laplace),  and  Barnett 
(1968)  (Cauchy).  In  the  last  column  of  the  table  are  Monte  Carlo  approximations  of  the  variances 
of  trimmed  means  which  can  be  found  in  Andrews  et.al.  (1972).  We  do  not  consider  the  trimmed 
mean  for  the  Cauchy  distribution  with  10%  trim  and  n  =  10  as  the  true  variance  is  infinite. 

Looking  at  these  numbers  carefully  suggests  that  the  second  order  term  of  (5.4.10)  is  not 
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correct  as  the  difference  of  the  columns  labelled  ‘Exact’  and  ‘Approx.’  often  decreases  only  by  a 
factor  of  about  two  as  n  doubles.  Note  also,  however,  that  even  these  apparently  incorrect  second 
order  approximations  can  be  a  great  improvement  over  first  order  expansions.  The  Monte  Carlo 
approximations  given  used  variance  reduction  techniques  and  simulation  sizes  of  640  to  1000.  The 
error  of  these  approximations  is  comparable  to  the  error  of  2nd  order  approximation  given.  We  have 
not  attempted  to  rigorously  derive  a  correct  version  of  the  Second  order  variance  approximation  of 
the  trimmed  mean.  Because  of  the  widespread  interest  in  trimmed  means  such  a  derivation  might 
be  worthwhile. 
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The  primary  result  we  present  gives  sufficient  conditions  for  the  validity  of  moment  ap¬ 
proximations  based  on  moments  of  Taylor’s  series  approximations  which  are  obtained  by  using  func¬ 
tional  differentiation.  We  apply  the  theory  to  some  L-  and  M-estimates  and  present  a  Monte  Carlo 
study  to  show  that  the  approximations  for  the  variance  of  statistics  based  on  small  to  moderate 
sample  sizes  can  be  quite  good. 

Prior  to  studying  the  above  general  problem  we  consider  the  problem  of  the  convergence  of 
the  moments  of  a  standardized  quantile  to  those  of  an  appropriate  normal  distribution.  Our  proof 
of  moment  convergence  requires  fewer  non-tail  conditions  on  the  underlying  distribution  than  were 
used  in  previously  published  results.  We  also  extend  the  result  to  show  necessary  and  sufficient  tail 
conditions  on  the  underlying  distribution  for  convergence  of  the  moment  generating  function  of  a 
standardized  quantile  to  that  of  a  normal  distribution.  f 
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